Building Video Game Recommender Systems with FastAPI, PostgreSQL, and Render: Part 1

Introduction

allow an software to generate clever strategies for a consumer, successfully sorting related content material out from the remaining. On this article, we construct and deploy a dynamic online game recommender system leveraging PostgreSQL, FastAPI, and Render to advocate new video games for a consumer primarily based on these they’ve interacted with. The intent is to supply a transparent instance of how a standalone recommender system could be constructed, which might then be tied right into a front-end system or different software.

For this venture, we use online game knowledge accessible from Steams API however this might simply get replaced by no matter product knowledge you’re all in favour of, the important thing steps would be the similar. We’ll cowl tips on how to retailer this knowledge in a Database, vectorize the sport Tags, generate similarity scores primarily based on the video games a consumer has interacted with, and return a sequence of related suggestions. On the finish of this text, we’ll have this recommender system deployed as a Net Utility with FastAPI such that every time a consumer interacts with a brand new Sport, we will dynamically generate and retailer a brand new set of suggestions for that consumer.

The next instruments will likely be used:

PostgreSQL
FastAPI
Docker
Render

These simply within the GitHub repository can discover it here.

Desk of Contents

Because of the size of this venture, it’s divided into two articles. The primary portion covers the setup and idea behind this venture (steps 1–5 proven beneath), and the second half covers deploying it. If you happen to’re searching for the second half it’s positioned here.

Deploying a PostgreSQL database on Render
Deploying a FastAPI app as a Render Net Utility
– Dockerizing our software
– Pushing Docker Picture to DockerHub
– Pulling from DockerHub to Render

Dataset Overview

The dataset for this venture accommodates knowledge for the highest ~2000 video games from the steamworks API. This knowledge is free and licensed for private and industrial use, topic to the phrases of service, there’s a 200 requests/5 minute price restrict that resulted in us working with solely a subset of the information. The phrases of service could be discovered here.

An summary of the video games dataset is proven beneath. Many of the fields are comparatively self-descriptive; the important thing factor to notice is that the distinctive product identifier is appid. Along with this dataset, we even have a number of extra tables that we’ll element beneath; a very powerful one for our recommender system is a recreation tags desk, which accommodates the appid values mapped to every tag related to the sport (technique, RPG, card recreation, and so forth.). These had been drawn from the classes area proven within the Knowledge Overview after which pivoted to create the game_tags desk in order that there’s a novel row for every appip:class mixture.

Determine 2: Dataset Overview

For a extra detailed overview of the construction of our venture, see the diagram beneath.

Determine 3: Challenge File Construction

Now we’ll present a fast overview of the structure of this venture after which dive into tips on how to populate our database.

Architecture

For our recommender system system, we’ll use a PostgreSQL database with a FastAPI data access + processing layer that will allow us to add or remove games from a user’s game list. Users making changes to their game library, via a FastAPI POST request, will also kick off a recommendation pipeline leveraging FastAPI’s Background Tasks function that will query their liked games from the database, calculate a similarity score with non-liked games, and update a user_recommendation table with their new top-N recommended games. Finally, both the PostgreSQL database and FastAPI service will be deployed on Render so they can be accessed beyond our local environment. For this deployment step, any cloud service could have been used, but we chose Render in this case for its simplicity.

To recap, our overall workflow from the user’s perspective will look like this:

The user adds a game to their library by making a POST request from FastAPI to our database.
- If we wanted to attach our recommender system to a front-end application, we could easily tie this Post API into a user interface.
This post request kicks off a FastAPI background task that runs our recommender pipeline.
The recommender pipeline queries our database for the user’s game list and the global games list.
A similarity score is then calculated between the user’s games and all games using our game tags.
Finally, our recommender pipeline makes a post request to the database to update the recommended games table for that user.

Setting Up the Database

Before we build our recommender system, the first step is to set up our database. Our basic database diagram is shown in Figure 5. We previously discussed our game table above; this is the base dataset that the rest of our data is generally derived from. A full list of our tables is here:

Game Table: Contains base game data for each unique game in our database
User Table: A Dummy user table containing example information populated for example.
User_Game Table: Contains the mappings between all games that a user has ‘liked’; this table is one of the base tables used to generate recommendations by capturing what games a user is interested in.
Game_Tags Table: This contains an appid:game_tag mapping, where game tag could be something like ‘strategy’, ‘rpg’, ‘comedy’, a descriptive tag that captures part of the essence of a game. There are multiple tags mapped to each appid.
User_Recommendation Table: This is our target table that will be updated by our pipeline. Every time a user interacts with a new game, our recommendation pipeline will run and generate a new series of recommendations for that user that will be stored here.

To set up these tables, we can simply run our src/load_database.py file. This file creates and populates our tables in a couple of steps that are outlined below. Note, right now we’re going to focus on understanding how to write this data to a generic database, so all you have to know now is that the External_Database_Url below is the URL to whatever database you want to use. In the second half of this article, we’ll walk through how to set up a database on Render and copy the URL into your .env file.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.ext.declarative import declarative_base
import os
from dotenv import load_dotenv
from utils.db_handler import DatabaseHandler
import pandas as pd
import uuid
import sys
from sqlalchemy.exc import OperationalError
import psycopg2

# Loading environmental variables
load_dotenv(override=True)

# Construct PostgreSQL connection URL for Render
URL_database = os.environ.get("External_Database_Url")

# Initialize DatabaseHandler with our URL
engine = DatabaseHandler(URL_database)

# loading initial user data
users_df = pd.read_csv("Data/users.csv")
games_df = pd.read_csv("Data/games.csv")
user_games_df = pd.read_csv("Data/user_games.csv")
user_recommendations_df = pd.read_csv("Data/user_recommendations.csv")
game_tags_df = pd.read_csv("Data/game_tags.csv")

First, we load five CSV files into dataframes from our Data folder; we have one file for each of the tables shown in our database diagram. We also establish a connection to our data by declaring an engine variable; this engine variable uses a custom DataBaseHandler class with the initialization method shown below. This class takes a connection string to our database on Render(or your preferred cloud service), passed in from our .env file, and contains all of our database connect, update, delete, and test functionalities.

After loading our data and instantiating our DatabaseHandler class, we then have to define a query to create each of the five tables and execute these queries using the DatabaseHandler.create_table function. This is a very simple function that connects to our database, executes the query, and closes the connection, leaving us with the five tables shown in our database diagram; however, they are currently empty.

# Defining queries to create tables
user_table_creation_query = """CREATE TABLE IF NOT EXISTS users (
    id UUID PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    password VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    role VARCHAR(50) NOT NULL
    )
    """
game_table_creation_query = """CREATE TABLE IF NOT EXISTS games (
    id UUID PRIMARY KEY,
    appid VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255) NOT NULL,
    type VARCHAR(255),
    is_free BOOLEAN DEFAULT FALSE,
    short_description TEXT,
    detailed_description TEXT,
    developers VARCHAR(255),
    publishers VARCHAR(255),
    price VARCHAR(255),
    genres VARCHAR(255),
    categories VARCHAR(255),
    release_date VARCHAR(255),
    platforms TEXT,
    metacritic_score FLOAT,
    recommendations INTEGER
    )
    """

user_games_query = """CREATE TABLE IF NOT EXISTS user_games (
    id UUID PRIMARY KEY,
    username VARCHAR(255) NOT NULL,
    appid VARCHAR(255) NOT NULL,
    shelf VARCHAR(50) DEFAULT 'Wish_List',
    rating FLOAT DEFAULT 0.0,
    review TEXT
    )
    """
recommendation_table_creation_query = """CREATE TABLE IF NOT EXISTS user_recommendations (
    id UUID PRIMARY KEY,
    username VARCHAR(255),
    appid VARCHAR(255),
    similarity FLOAT
    )
    """

game_tags_creation_query = """CREATE TABLE IF NOT EXISTS game_tags (
    id UUID PRIMARY KEY,
    appid VARCHAR(255) NOT NULL,
    category VARCHAR(255) NOT NULL
    )
    """



# Running queries to create tables
engine.delete_table('user_recommendations')
engine.delete_table('user_games')
engine.delete_table('game_tags')
engine.delete_table('games')
engine.delete_table('users')

# Create tables
engine.create_table(user_table_creation_query)
engine.create_table(game_table_creation_query)
engine.create_table(user_games_query)
engine.create_table(recommendation_table_creation_query)
engine.create_table(game_tags_creation_query)

Following the initial table setup, we then run a quality check to ensure each of our datasets has the required ID column, populate the data from the dataframes into the appropriate table, and then test to ensure that the tables were populated correctly. The test_table function returns a dictionary that will be of the form {‘table_exists’: True, ‘table_has_data’: True} if the setup worked correctly.

# Ensuring each row of each dataframe has a unique ID
if 'id' not in users_df.columns:
    users_df['id'] = [str(uuid.uuid4()) for _ in range(len(users_df))]
if 'id' not in games_df.columns:
    games_df['id'] = [str(uuid.uuid4()) for _ in range(len(games_df))]
if 'id' not in user_games_df.columns:
    user_games_df['id'] = [str(uuid.uuid4()) for _ in range(len(user_games_df))]
if 'id' not in user_recommendations_df.columns:
    user_recommendations_df['id'] = [str(uuid.uuid4()) for _ in range(len(user_recommendations_df))]
if 'id' not in game_tags_df.columns:
    game_tags_df['id'] = [str(uuid.uuid4()) for _ in range(len(game_tags_df))]

# Populates the 4 tables with data from the dataframes
engine.populate_table_dynamic(users_df, 'users')
engine.populate_table_dynamic(games_df, 'games')
engine.populate_table_dynamic(user_games_df, 'user_games')
engine.populate_table_dynamic(user_recommendations_df, 'user_recommendations')
engine.populate_table_dynamic(game_tags_df, 'game_tags')

# Testing if the tables were created and populated correctly
print(engine.test_table('users'))
print(engine.test_table('games'))
print(engine.test_table('user_games'))
print(engine.test_table('user_recommendations'))
print(engine.test_table('game_tags'))

Getting Started with FastAPI

Now that we have our database set up and populated, we need to build the methods to access, update, and delete data, using FastAPI. FastAPI enables us to easily build standardized(and fast) API’s to enable interaction with our database. The FastAPI docs offer a great step-by-step tutorial that can be found here. As a high-level abstract, there are a number of nice options that make FastAPI best for serving because the interplay layer between a database and a front-end software.

Standardization: FastAPI permits us to outline routes to work together with our tables in a standardized manner utilizing GET, POST, DELETE, UPDATE, and so forth. strategies. This standardization allows us to construct an information entry layer in pure Python that may then be work together with all kinds of front-end software. We merely name the API strategies we wish within the entrance finish, no matter what language its constructed in.
Knowledge Validation: As we’ll present beneath, we have to outline a Pydantic knowledge mannequin for every object we work together with(suppose our video games and consumer tables). The primary benefit of that is that it ensures all our variables have outlined knowledge varieties, for instance, if we outline our Sport object such that the score area is of kind float and a consumer tries to make a submit request so as to add a brand new entry with a score of “nice” it wont work. This built-in knowledge validation will assist us stop all types of information high quality points from as our system scales.
Asynchronous: FastAPI features can run asynchronously, which means one among them isn’t depending on the opposite ending. This will considerably enhance the efficiency as a result of we received’t have a sluggish Quick process ready on a sluggish one to finish.
Swagger Docs Constructed In: FastAPI has a built-in UI that we will navigate to on localhost, enabling us to simply take a look at and work together with our routes.

FastAPI Fashions

The FastAPI portion of our venture depends on two most important information: fashions.py, which defines the information fashions that we’ll be interacting with (video games, customers, and so forth.), and most important.py, which defines our precise FastAPI App and accommodates our routes. Within the context of FastAPI, Routes outline the totally different paths for processing requests. For instance, we would have a /video games path to request video games from our database.

First, let’s talk about our fashions.py file. On this file, we outline all of our fashions. Whereas we’ve got totally different fashions for various objects the overall strategy would be the similar so we’ll solely talk about the video games mannequin, proven beneath, intimately. The very first thing you’ll discover beneath is that we’ve got two precise lessons outlined for our Sport object: a GameModel class that inherits from the Pydantic base mannequin, and a Sport class that inherits from the sqlalchemy declarative_base. The pure query then is, why do we’ve got two lessons for one knowledge construction(our recreation’s knowledge construction)?

If we weren’t utilizing an SQL database for this venture and as an alternative learn every of our CSV information right into a dataframe each time most important.py was run, then we wouldn’t want the Sport class, solely the GameModel class. On this situation, we’d learn in our video games.csv dataframe, and FastAPI would use the GameModel class to make sure datatypes had been appropriately adhered to.

Nevertheless, as a result of we’re utilizing an SQL database, it makes extra sense to have separate lessons for our API and our database, as the 2 lessons have barely totally different jobs. Our API class handles knowledge validation, serialization, and non-compulsory fields, and our database class handles database-specific issues like defining major/overseas keys, defining which desk the article maps to, and defending safe knowledge. To reiterate the final level, we would have delicate fields in our database which are for inner consumption solely, and we don’t wish to expose them to a consumer via an API( password for instance). We are able to deal with this concern by having a separate user-facing Pydantic class and an Inner SQL Alchemy one.

Under is an instance of how this may be carried out for our Video games object; we’ve got separate lessons outlined for our different tables, which could be discovered here; nevertheless, the overall construction is identical.

from pydantic import BaseModel
from uuid import UUID,uuid4
from typing import Elective
from enum import Enum
from sqlalchemy import Column, String, Float, Integer
import sqlalchemy.dialects.postgresql as pg
from sqlalchemy.dialects.postgresql import UUID as SA_UUID
from sqlalchemy.ext.declarative import declarative_base
import uuid
from uuid import UUID

# loading sql mannequin
from sqlmodel import Subject, Session, SQLModel, create_engine, choose

# Initialize the bottom class for SQLAlchemy fashions
Base = declarative_base()

# That is the Sport mannequin for the database
class Sport(Base):
    __tablename__ = "optigame_products"  # Desk title within the PostgreSQL database

    id = Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, distinctive=True, nullable=False)
    appid = Column(String, distinctive=True, nullable=False)  
    title = Column(String, nullable=False)  
    kind = Column(String, nullable=True)  
    is_free = Column(pg.BOOLEAN, nullable=True, default=False)  #
    short_description = Column(String, nullable=True)  
    detailed_description = Column(String, nullable=True)  
    builders = Column(String, nullable=True)  
    publishers = Column(String, nullable=True)  
    value = Column(String, nullable=True)  
    genres = Column(String, nullable=True)  
    classes = Column(String, nullable=True)  
    release_date = Column(String, nullable=True)  
    platforms = Column(String, nullable=True)  
    metacritic_score = Column(Float, nullable=True)  
    suggestions = Column(Integer, nullable=True)  

class GameModel(BaseModel):
    id: Elective[UUID] = None
    appid: str
    title: str
    kind: Elective[str] = None
    is_free: Elective[bool] = False
    short_description: Elective[str] = None
    detailed_description: Elective[str] = None
    builders: Elective[str] = None
    publishers: Elective[str] = None
    value: Elective[str] = None
    genres: Elective[str] = None
    classes: Elective[str] = None
    release_date: Elective[str] = None
    platforms: Elective[str] = None
    metacritic_score: Elective[float] = None
    suggestions: Elective[int] = None

    class Config:
        orm_mode = True  # Allow ORM mode to work with SQLAlchemy objects
        from_attributes = True # Allow attribute entry for SQLAlchemy objects

Setting Up FastAPI Routes

After we’ve got our Fashions outlined, we will then create strategies to work together with these fashions and request knowledge from the database(GET), add knowledge to the Database(POST), or take away knowledge from the database(DELETE). Under is an instance of how we will outline a GET request for our video games mannequin. We now have some preliminary setup in the beginning of our most important.py perform to fetch the database URL and hook up with it. Then we initialize our app and add middleware to outline which URLs we’ll settle for requests from. As a result of we’ll be deploying the FastAPI venture on Render and sending requests to it from our native machine, the one origin we’re permitting is localhost port 8000. We then outline our app.get technique known as fetch_products, which takes an appid enter, queries our database for Sport objects the place appid is the same as our filtered appid, and returns these merchandise.

Word the beneath snipped accommodates simply the setup and first get technique, the remaining are pretty related and obtainable on the Repo, so we received’t give an in-depth clarification for every one right here.

from fastapi import FastAPI, Relies upon, HTTPException, BackgroundTasks
from uuid import uuid4, UUID
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session
from dotenv import load_dotenv
import os

# Load setting variables
load_dotenv()

# safety imports
from fastapi.middleware.cors import CORSMiddleware
from fastapi.safety import OAuth2PasswordBearer

# customized imports
from src.fashions import Person, Sport, GameModel, UserModel,  UserGameModel, UserGame, GameSimilarity,GameSimilarityModel, UserRecommendation, UserRecommendationModel
from src.similarity_pipeline import UserRecommendationService

# Load the database connection string from setting variable or .env file
DATABASE_URL = os.environ.get("Internal_Database_Url")

# creating connection to the database
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()

# Create the database tables (if they do not exist already)
Base.metadata.create_all(bind=engine)

# Dependency to get the database session
def get_db():
    db = SessionLocal()
    attempt:
        yield db
    lastly:
        db.shut()

# Initialize the FastAPI app
app = FastAPI(title="Sport Retailer API", model="1.0.0")

# Add CORS middleware to permit requests 
origins = ["http://localhost:8000"]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,  
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)



#-------------------------------------------------#
# ----------PART 1: GET METHODS-------------------#
#-------------------------------------------------#
@app.get("/")
async def root():
    return {"message": "Hey World"}


@app.get("/api/v1/video games/")
async def fetch_products(appid: str = None, db: Session = Relies upon(get_db)):
    # Question the database utilizing the SQLAlchemy Sport mannequin
    if appid:
        merchandise = db.question(Sport).filter(Sport.appid == appid).all()
    else:
        merchandise = db.question(Sport).all()
    return [GameModel.from_orm(product) for product in products]

As soon as we’ve got our most important.py outlined, we will lastly run it from our base venture listing utilizing the command beneath.

uvicorn src.most important:app --reload

As soon as that is performed, we will navigate to http://127.0.0.1:8000/docs and see the beneath interactive FastAPI setting. From this web page, we will take a look at any of our strategies outlined in our most important.py file. Within the case of our fetch_products perform, we will cross it an appid and return any matching video games from our database.

Constructing our Similarity Pipeline

We now have our database arrange and may entry and replace knowledge by way of FastAPI; it’s now time to show to the central characteristic of this venture: a recommender pipeline. Recommender techniques are a well-researched area, and we’re not including any innovation right here; nevertheless, this may supply a transparent instance of tips on how to implement a primary recommender system utilizing FastAPI.

Getting Began — The way to Suggest Merchandise?

If we take into consideration the query “How would I like to recommend new merchandise {that a} consumer will like?”, there are two approaches that make intuitive sense.

Collaborative Advice Programs: If I’ve a sequence of customers and a sequence of merchandise, I might establish customers with related pursuits by their general basket of merchandise after which establish merchandise ‘lacking’ from a given consumer’s basket. For instance, if I’ve customers 1–3 and merchandise A-C, customers 1–2 like all three merchandise, however consumer 3 has up to now solely preferred merchandise A + B, then I would advocate them product C. This logically is sensible; all three customers have a excessive diploma of overlap in merchandise that they’ve preferred, however product C is lacking from consumer 3’s basket, there’s a excessive probability that they want it as properly. This strategy of producing suggestions by evaluating like customers known as collaborative filtering.
Content material-Primarily based Advice System: If I’ve a sequence of merchandise, I might establish merchandise which are much like merchandise {that a} consumer has preferred and advocate these merchandise. For instance, if I’ve a sequence of tags for every recreation, I might convert every recreation’s sequence of tags right into a vector of 1s and 0s after which use a similarity measure (on this case, a cosine similarity measure) to measure the similarity between video games primarily based on their vectors. As soon as I’ve performed this, I can then return the highest N most related video games to these preferred by a consumer primarily based on their similarity rating.

Extra on Recommender Programs could be discovered here.

As a result of our preliminary dataset doesn’t have a big quantity of customers, we don’t have the required knowledge to counsel objects primarily based on consumer similarity, which is called a cold start problem. Because of this, we’ll as an alternative develop a content-based recommender system as we’ve got a major quantity of recreation knowledge to work with.

To construct our pipeline, we’ve got to deal with two challenges: (1) How will we go about calculating similarity scores for a consumer, and (2) how will we automate this to run every time a consumer makes an replace to their video games?

We’ll go over how a similarity pipeline could be triggered every time a consumer makes a POST request by ‘liking’ a recreation, after which cowl tips on how to construct the pipeline itself.

Tying Recommender Pipeline to FastAPI

For now, think about we’ve got a Advice Service that can replace our user_recommendation desk. We wish to make sure that this service known as every time a consumer updates their preferences. We are able to implement this in a few steps as proven beneath; first, we outline a generate_recommendations_background perform, this perform is chargeable for connecting to our database, operating the similarity pipeline, after which closing the connection. Subsequent, we have to guarantee that is known as when a consumer makes a submit request(i.e., likes a brand new recreation); to do that, we merely add the perform name on the finish of our create_user_game submit request perform.

The results of this workflow is that every time a consumer makes a submit request to our user_game desk, they name the create_user_game perform, add a brand new user_game object to the database, after which run the similarity pipeline as a background perform.

Word: The Under submit technique and helper perform are saved in most important.py with the remainder of our FastAPI strategies.

# importing similarity pipeline
from src.similarity_pipeline import UserRecommendationService

# Background process perform
def generate_recommendations_background(username: str, database_url: str):
    """Background process to generate suggestions for a consumer"""
    # Create a brand new database session for the background process
    background_engine = create_engine(database_url)
    BackgroundSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=background_engine)
    
    db = BackgroundSessionLocal()
    attempt:
        recommendation_service = UserRecommendationService(db, database_url)
        recommendation_service.generate_recommendations_for_user(username)
    lastly:
        db.shut()

# Put up technique which calls background process perform
@app.submit("/api/v1/user_game/")
async def create_user_game(user_game: UserGameModel, background_tasks: BackgroundTasks, db: Session = Relies upon(get_db)):
    # Test if the entry already exists
    current = db.question(UserGame).filter_by(username=user_game.username, appid=user_game.appid).first()
    if current:
        increase HTTPException(status_code=400, element="Person already has this recreation.")

    # Put together knowledge with defaults
    user_game_data = {
        "username": user_game.username,
        "appid": user_game.appid,
        "shelf": user_game.shelf if user_game.shelf will not be None else "Wish_List",
        "score": user_game.score if user_game.score will not be None else 0.0,
        "evaluate": user_game.evaluate if user_game.evaluate will not be None else ""
    }
    if user_game.id will not be None:
        user_game_data["id"] = UUID(str(user_game.id))

    # Save the consumer recreation to database
    db_user_game = UserGame(**user_game_data)
    db.add(db_user_game)
    db.commit()
    db.refresh(db_user_game)
    
    # Set off background process to generate suggestions for this consumer
    background_tasks.add_task(generate_recommendations_background, user_game.username, DATABASE_URL)
    
    return db_user_game

Constructing Recommender Pipeline

Now that we perceive how our similarity pipeline could be triggered when a consumer updates their preferred video games, it’s time to dive into the mechanics of how the advice pipeline works. Our advice pipeline is saved in similarity_pipeline.py and accommodates our UserRecommendationService class that we confirmed tips on how to import and instantiate above. This class accommodates a sequence of helper features which are in the end all known as within the generate_recommendations_for_user technique. There are 7 primary steps known as so that we’ll stroll via one after the other.

Fetching a consumer’s Video games: To generate related recreation suggestions, we have to retrieve the video games {that a} consumer has already added to their recreation basket. That is performed by calling our fetch_user_games helper perform. This perform queries our user_games desk utilizing the consumer’s ID, which is making the submit request as an enter, and returning all video games of their basket.
Fetching recreation tags: To check video games, we want a dimension to match them on, and that dimension is the tags related to every recreation(technique, board recreation, and so forth.). To retrieve the sport:tag mapping, we name our fetch_all_game_tags perform, which returns the tags for all of the video games in our database
Vectorizing recreation tags: To check the similarity between video games A and B, we first must vectorize the sport tags utilizing our create_game_vectors perform. This perform takes a sequence of all tags in alphabetical order and checks if every of the tags is related to a given recreation. For instance, if our complete set of tags was [boardgame, deckbuilding, resource-management] and recreation 1 simply had the boardgame tag related to it, then its vector could be [1, 0, 0].
Creating our customers vector: as soon as we’ve got a vector representing every recreation, we then want an mixture consumer vector to match it to. To attain this, we use our create_user_vector perform, which generates an mixture vector of the identical size as our recreation vectors that we will then use to generate a similarity rating between our consumer and each different recreation.
Calculate Similarity: We use the vectors created in steps 3 and 4 in our calculate_user_recommendations, which calculates a cosine similarity rating starting from 0–1 and measuring the similarity between every recreation and our consumer mixture video games
Deleting outdated Suggestions: Earlier than we populate our user_recommendations desk with new suggestions for a consumer, we first need to delete the outdated ones with delete_existing_recommendations. This deletes simply the suggestions for the consumer who made the submit request; the others stay the identical.
Populate new Suggestions: After deleting the outdated suggestions, we then populate the brand new ones with save_recommendations.



from sqlalchemy.orm import Session
from sqlalchemy import create_engine, textual content
from src.fashions import UserGame, UserRecommendation
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import uuid
from typing import Checklist
import logging

# Arrange logging
logging.basicConfig(degree=logging.INFO)
logger = logging.getLogger(__name__)

class UserRecommendationService:
    def __init__(self, db_session: Session, database_url: str):
        self.db = db_session
        self.database_url = database_url
        self.engine = create_engine(database_url)

    def fetch_user_games(self, username: str) -> pd.DataFrame:
        """Fetch all video games for a selected consumer"""
        question = textual content("SELECT username, appid FROM user_games WHERE username = :username")
        with self.engine.join() as conn:
            outcome = conn.execute(question, {"username": username})
            knowledge = outcome.fetchall()
            return pd.DataFrame(knowledge, columns=['username', 'appid'])

    def fetch_all_category(self) -> pd.DataFrame:
        """Fetch all recreation tags"""
        question = textual content("SELECT appid, class FROM class")
        with self.engine.join() as conn:
            outcome = conn.execute(question)
            knowledge = outcome.fetchall()
            return pd.DataFrame(knowledge, columns=['appid', 'category'])

    def create_game_vectors(self, tag_df: pd.DataFrame) -> tuple[pd.DataFrame, List[str], Checklist[str]]:
        """Create recreation vectors from tags"""
        unique_tags = tag_df['category'].drop_duplicates().sort_values().tolist()
        unique_games = tag_df['appid'].drop_duplicates().sort_values().tolist()
        
        game_vectors = []
        for recreation in unique_games:
            tags = tag_df[tag_df['appid'] == recreation]['category'].tolist()
            vector = [1 if tag in tags else 0 for tag in unique_tags]
            game_vectors.append(vector)
        
        return pd.DataFrame(game_vectors, columns=unique_tags, index=unique_games), unique_tags, unique_games

    def create_user_vector(self, user_games_df: pd.DataFrame, game_vectors: pd.DataFrame, unique_tags: Checklist[str]) -> pd.DataFrame:
        """Create consumer vector from their performed video games"""
        if user_games_df.empty:
            return pd.DataFrame([[0] * len(unique_tags)], columns=unique_tags, index=['unknown_user'])
        
        username = user_games_df.iloc[0]['username']
        user_games = user_games_df['appid'].tolist()
        
        # Solely preserve video games that exist in game_vectors
        user_games = [g for g in user_games if g in game_vectors.index]
        
        if not user_games:
            user_vector = [0] * len(unique_tags)
        else:
            played_game_vectors = game_vectors.loc[user_games]
            user_vector = played_game_vectors.imply(axis=0).tolist()
        
        return pd.DataFrame([user_vector], columns=unique_tags, index=[username])

    def calculate_user_recommendations(self, user_vector: pd.DataFrame, game_vectors: pd.DataFrame, top_n: int = 20) -> pd.DataFrame:
        """Calculate similarity between consumer vector and all recreation vectors"""
        username = user_vector.index[0]
        user_vector_data = user_vector.iloc[0].values.reshape(1, -1)
        
        # Calculate similarities
        similarities = cosine_similarity(user_vector_data, game_vectors)
        similarity_df = pd.DataFrame(similarities.T, index=game_vectors.index, columns=[username])
        
        # Get high N suggestions
        top_games = similarity_df[username].nlargest(top_n)
        
        suggestions = []
        for appid, similarity in top_games.objects():
            suggestions.append({
                "username": username,
                "appid": appid,
                "similarity": float(similarity)
            })
        
        return pd.DataFrame(suggestions)

    def delete_existing_recommendations(self, username: str):
        """Delete current suggestions for a consumer"""
        self.db.question(UserRecommendation).filter(UserRecommendation.username == username).delete()
        self.db.commit()

    def save_recommendations(self, recommendations_df: pd.DataFrame):
        """Save new suggestions to database"""
        for _, row in recommendations_df.iterrows():
            advice = UserRecommendation(
                id=uuid.uuid4(),
                username=row['username'],
                appid=row['appid'],
                similarity=row['similarity']
            )
            self.db.add(advice)
        self.db.commit()

    def generate_recommendations_for_user(self, username: str, top_n: int = 20):
        """Most important technique to generate suggestions for a selected consumer"""
        attempt:
            logger.information(f"Beginning advice era for consumer: {username}")
            
            # 1. Fetch consumer's video games
            user_games_df = self.fetch_user_games(username)
            if user_games_df.empty:
                logger.warning(f"No video games discovered for consumer: {username}")
                return
            
            # 2. Fetch all recreation tags
            tag_df = self.fetch_all_category()
            if tag_df.empty:
                logger.error("No recreation tags present in database")
                return
            
            # 3. Create recreation vectors
            game_vectors, unique_tags, unique_games = self.create_game_vectors(tag_df)
            
            # 4. Create consumer vector
            user_vector = self.create_user_vector(user_games_df, game_vectors, unique_tags)
            
            # 5. Calculate suggestions
            recommendations_df = self.calculate_user_recommendations(user_vector, game_vectors, top_n)
            
            # 6. Delete current suggestions
            self.delete_existing_recommendations(username)
            
            # 7. Save new suggestions
            self.save_recommendations(recommendations_df)
            
            logger.information(f"Efficiently generated {len(recommendations_df)} suggestions for consumer: {username}")
            
        besides Exception as e:
            logger.error(f"Error producing suggestions for consumer {username}: {str(e)}")
            self.db.rollback()
            increase

Wrapping Up

On this article, we coated tips on how to arrange a PostgreSQL database and FastAPI software to run a recreation recommender system. Nevertheless, we haven’t but gone over tips on how to deploy this method to a cloud service to permit others to work together with it. For half two protecting precisely this, learn on in Part 2.

Figures: All pictures, except in any other case famous, are by the creator.

Hyperlinks

Github Repository for Challenge: https://github.com/pinstripezebra/recommender_system
FastAPI Docs: https://fastapi.tiangolo.com/tutorial/
Recommender Programs: https://en.wikipedia.org/wiki/Recommender_system
Cosine Similarity: https://en.wikipedia.org/wiki/Cosine_similarity)

Source link

Creating AI that matters | MIT News

Scaling Recommender Transformers to a Billion Parameters

Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

How to Prepare Knowledge Workers for an AI-Powered Future with Paul Roetzer [MAICON 2025 Speaker Series]

There and Back Again: An AI Career Journey

Don’t let hype about AI agents get ahead of reality

How OpenAI and Microsoft’s New Pact Unlocks the Path to AGI

The Math You Need to Pan and Tilt 360° Images

Most Popular

Transformers (and Attention) are Just Fancy Addition Machines

How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

How to Approach Data Collection for Conversational AI

Our Picks

Topp 10 AI-filmer genom tiderna

OpenAIs nya webbläsare ChatGPT Atlas