Upvote/Downvote Weighted Ranking System in Laravel Using Bayesian Formula
Updated on October 16,2022
Recently, I was developing a Q & A application that required me to rate answers based on the number of likes and dislikes received. The plan was to ensure that answers with the most likes and least dislikes appeared at the top. This got me thinking. How could I make the ranking system efficient, unbiased and dynamic? After a few hours of research, I bumped into an article by Markus Weichselbaum on the Bayesian rating that saved me a great deal. Spare a few minutes and go through the article to understand what Bayesian rating is and why it is the method of choice.
In this tutorial, we are going to discuss how to implement the backend for the weighted rating system in Laravel using the Bayesian formula. No background in Laravel is required as one can easily copy the code and use it in any other PHP project. We will create a simple Question Answer application to demonstrate the concept.
Creating Migrations and Models
We need to create models and migrations for the four tables we will use in this tutorial. The four tables are questions, answers, users and votes. In Laravel ,authentication is very simple. Everything is configured to work out of the box. In the console, run the following command:
php artisan make:auth
This command will create the authentication views, a user model, the routes for the authentication endpoints as well as the migration for users table.
Not using Laravel? No problem. Below is the users' table schema:
CREATE TABLE `users` (
`id` bigint(20) UNSIGNED NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`email` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`email_verified_at` timestamp NULL DEFAULT NULL,
`password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`remember_token` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Next, let’s create the models and migrations for the other three tables using the commands below:
php artisan make:model Question -m
php artisan make:model Answer -m
php artisan make:model Vote -m
The commands will create Question, Answer and Vote models respectively. The -m option will automatically create a migration for us eliminating the need to run the “create migration” command.
Update database/migrations/xxxx_xx_xx_xxxxxx_reate_questions_table.php file and add three fields:
public function up(){
Schema::create('questions', function (Blueprint $table) {
$table->increments('id');
$table->string('question');
$table->integer('is_deleted')->default(0);
$table->text('slug', 65535);
$table->timestamps();
});
}
Questions table schema:
CREATE TABLE `questions` (
`id` int(10) UNSIGNED NOT NULL,
`question` text COLLATE utf8mb4_unicode_ci NOT NULL,
`is_deleted` int(11) NOT NULL DEFAULT '0',
`slug` text COLLATE utf8mb4_unicode_ci NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Update database/migrations/xxxx_xx_xx_xxxxxx_create_answers_table.php file and add seven fields:
Schema::create('answers', function (Blueprint $table) {
$table->increments('id');
$table->text('answer', 65535)->nullable();
$table->integer('is_deleted')->default(0);
$table->integer('question_id')->nullable();
$table->integer('rank')->default(1);
$table->integer('user_id')->unsigned()->nullable()->default(0);
$table->float('bayesian_rating', 10, 0)->nullable();
$table->integer('must_update')->default(1);
$table->timestamps();
});
Answers table schema:
CREATE TABLE `answers` (
`id` int(10) UNSIGNED NOT NULL,
`answer` text COLLATE utf8mb4_unicode_ci,
`is_deleted` int(11) NOT NULL DEFAULT '0',
`question_id` int(11) DEFAULT NULL,
`rank` int(11) NOT NULL DEFAULT '1',
`user_id` int(10) UNSIGNED DEFAULT '0',
`bayesian_rating` double DEFAULT NULL,
`must_update` int(11) NOT NULL DEFAULT '1',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Update database/migrations/xxxx_xx_xx_xxxxxx_create_votes_table.php file and add four fields:
public function up()
{
Schema::create('votes', function (Blueprint $table) {
$table->increments('id');
$table->integer('vote')->nullable();
$table->integer('user_id')->nullable();
$table->integer('answer_id')->nullable();
$table->integer('is_deleted')->default(0);
$table->timestamps();
});
}
Votes table schema:
CREATE TABLE `votes` (
`id` int(10) UNSIGNED NOT NULL,
`vote` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`answer_id` int(11) DEFAULT NULL,
`is_deleted` int(11) NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Now we can migrate our database using artisan command:
php artisan migrate
Seeding the Database
Next, we need to seed our tables. In seeds/DatabaseSeeder.php update run() function:
use App\Answer;
use App\Question;
use App\User;
use App\Vote;
use Faker\Factory as Faker;
use Illuminate\Database\Seeder;
use Illuminate\Support\Str;
class DatabaseSeeder extends Seeder {
/* Seed the application's database.
*
* @return void
*/
public function run() {
// use the faker library to generate fake data
$faker = Faker::create();
// create 20 users
for ( $i = 0; $i < 20; $i ++ ) {
User::create( [
'name' => $faker->name,
'email' => $faker->unique()->safeEmail,
'email_verified_at' => now(),
'password' => '$2y$10$TKh8H1.PfQx37YgCzwiKb.KjNyWgaHb9cbcoQgdIVFlYg7B77UdFm', // secret
'remember_token' => Str::random( 10 ),
] );
}
$users = User::all()->pluck( 'id' )->toArray();
// create 5 questions
for ( $i = 0; $i < 5; $i ++ ) {
$question_value = $faker->sentence( 6 );
$question = Question::create( [
'question' => $question_value,
'slug' => str_slug( $question_value ),
] );
}
$questions = Question::all()->pluck( 'id' )->toArray();
// each question to have 8 answers
foreach ($questions as $question){
for ( $i = 0; $i < 8; $i ++ ) {
$answer = Answer::create( [
'answer' => $faker->paragraph( 6 ),
'question_id' => $faker->randomElement( $questions ),
'user_id' => $question,
] );
}
}
// a question to have either 6 votes or 12 votes or 18 votes.
// A vote can either be a upvote or a downvote i.e 0 or 1
$votes = $faker->randomElement( [ 6, 12, 18 ] );
$users_clone = $users;
$answers = Answer::all()->pluck( 'id' )->toArray();
foreach ($answers as $answer){
for ( $i = 0; $i < $votes; $i ++ ) {
$del_val = $faker->randomElement( $users );
Vote::create( [
'vote' => $faker->randomElement( [ 0, 1 ] ),
'answer_id' => $answer,
'user_id' => $del_val,
] );
if ( ( $key = array_search( $del_val, $users_clone ) ) !== false ) {
unset( $users_clone[ $key ] );
}
}
}
for ( $i = 0; $i < $votes; $i ++ ) {
$del_val = $faker->randomElement( $users );
Vote::create( [
'vote' => $faker->randomElement( [ 0, 1 ] ),
'answer_id' => $faker->randomElement( $answers ),
'user_id' => $del_val,
] );
if ( ( $key = array_search( $del_val, $users_clone ) ) !== false ) {
unset( $users_clone[ $key ] );
}
}
}
}
We are using the Faker library included in Laravel to quickly generate data. The seeder generates fake testing data including the foreign key values. Seed our database with the artisan command:
php artisan db:seed
Creating the Cronjob Controller
Next, let’s create a controller that will process the requests. In your console, run the artisan command:
php artisan make:controller CronjobController
Then add an API route in routes/api.php to point to rank() method in CronjobController.
Route::get( 'rank', array( 'as' => 'rank', 'uses' => 'CronjobController@rank' ) );
Creating Relationships
Next up, we need to define relationships between the models. Questions and Answers tables have one to many relations. One question can have many answers. The relationship is defined by placing a function on the Question model:
// app/Question.php --path
/**
* Get answers associated with a question
*/
public function answers() {
return $this->hasMany( 'App\Answer' );
}
Similarly, Answers and Votes tables have one to many relationship. The relationship is defined by placing a function on the Answer model:
// app/Answer.php
/**
* Get votes associated with an answer
*/
public function votes() {
return $this->hasMany( 'App\Vote' );
}
Creating Bayesian Calculation Methods
First, let’s create a function to pull active answers associated with a particular question passed as an argument. In Answer model add the function:
/**
* Used in calculating Bayesian rating
*
* @param $question_id
* @param $order_by
* @param $order
*
* @return mixed
*/
public function getAnswers( $question_id, $order_by = "id", $order = "asc" ) {
$answers = Question::find( $question_id )->answers()->where( 'is_deleted', 0 )
->orderBy( $order_by, $order )->get();
return $answers;
}
Equivalent query:
select * from `answers` where `answers`.`question_id` = 1 and `answers`.`question_id` is not null and `is_deletedy` = 0 order by `id` asc
Next, create methods to calculate an answer’s upvotes, downvotes, and total votes (sum of upvotes and downvotes) and average total votes:
// app/Answer.php
/**
* Total up votes
* @param $id integer question_id
* @return mixed
*/
public function getUpvotes( $id = null ) {
$data = 0;
$id = isset( $this->id ) ? $this->id : $id;
$answers = Answer::find( $id );
if ( $answers && count( $answers->votes ) ) {
$data = $answers->votes()->where( 'is_deleted', 0 )->where( 'vote', 1 )->count();
}
return $data;
}
Equivalent query:
select count(*) as aggregate from `votes` where `votes`.`answer_id` = 2 and `votes`.`answer_id` is not null and `is_deleted` = 0 and `vote` = 1
/**
* Total down votes
* @param $id integer question_id
* @return mixed
*/
public function getDownvotes( $id = null ) {
$data = 0;
$id = isset( $this->id ) ? $this->id : $id;
$answers = Answer::find( $id );
if ( $answers && count( $answers->votes ) ) {
$data = $answers->votes()->where( 'is_deleted', 0 )->where( 'vote', 0 )->count();
}
return $data;
}
Equivalent query:
select count(*) as aggregate from `votes` where `votes`.`answer_id` = 2 and `votes`.`answer_id` is not null and `is_deleted` = 0 and `vote` = 0
/**
* Sum of up votes and down votes
* @param $id
*
* @return float|int
*/
public function getVotes( $id ) {
$total = $this->getUpvotes( $id ) + $this->getDownvotes( $id );
return $total;
}
/**
* Average total votes
* @param $answers object
*
* @return float|int
*/
public function getAverageVotes( $answers ) {
$average = 0;
$sum = 0;
$count = 0;
foreach ( $answers as $answer ) {
$sum += $this->getVotes( $answer->id );
$count ++;
}
if ( $sum > 0 ) {
$average = $sum / $count;
}
return $average;
}
We also need to implement methods to calculate an answer’s rating (upvotes divided by total votes) and the average rating of all answers to a question:
/**
* Upvotes divided by the sum of upvotes and downvotes
* @param $id
*
* @return float|int
*/
public function getRating( $id ) {
$value = 0;
$total = $this->getVotes( $id );
if ( $total > 0 ) {
$value = $this->getUpvotes( $id ) / $total;
}
return $value;
}
/**
* Average rating
* @param $definitions object
*
* @return float|int
*/
public function getAverageRating( $definitions ) {
$average = 0;
$sum = 0;
$count = 0;
foreach ( $definitions as $definition ) {
$sum += $this->getRating( $definition->id );
$count ++;
}
if ( $sum > 0 ) {
$average = $sum / $count;
}
return $average;
}
That is all we need to calculate the Bayesian rating. Let’s put all the methods together and calculate the rating as is shown below:
/**
* Get Bayesian rating
*
* @param $answers object
*
* @return array
*/
public function getBayesianRating( $answers ) {
foreach ( $answers as $answer ) {
$votes = $this->getVotes( $answer->id );
$average_votes = $this->getAverageVotes( $answers );
$numerator = ( $average_votes * $this->getAverageRating( $answers ) ) +
( $votes * $this->getRating( $answer->id ) );
$denominator = $average_votes + $votes;
$bayesian_rating = ( $denominator > 0 ) ? $numerator / $denominator : 0;
$answer->bayesian_rating = $bayesian_rating;
$answer->must_update = 0;
$answer->save();
}
if ( $answers ) {
$count = 1;
$answers_1 = $this->getAnswers( $answers[0]->question_id, 'bayesian_rating', 'desc' );
foreach ( $answers_1 as $value ) {
$value->rank = $count;
$value->save();
$count ++;
}
}
}
The method calculates the Bayesian rating, updates the bayesian_rating column in answers table, updates the must_update column to 0 then finally, it loops through the updated answers and updates the rank column in ascending order starting from 1 based on the value in the bayesian_rating column.
The above method will only rank answers for a single question. We need to create a function that checks for all questions whose answers have a must update status and ranks all of them as shown in the snippet:
/**
* @return string
*/
public function triggerRanking() {
$question_ids = Answer::select( 'question_id' )->where( 'must_update', 1 )->where( 'is_deleted', 0 )
->distinct( 'question_id' )->get();
foreach ( $question_ids as $question_id ) {
$answers = $this->getAnswers( $question_id->question_id);
if ( $answers ) {
$this->getBayesianRating( $answers );
}
}
return "success";
}
NOTE:
Remember to always set the must_update column in answers table to 1 whenever an answer is upvoted or downvoted. The first time you hit the endpoint calling the function triggerRanking(), all the answers will be ranked since the must_update column is set to 1 by default. However, subsequent hits will only affect answers whose votes have changed.
Finally, let's create an endpont in CronjobController and call the triggerRanking() function:
/**
* cron job to update answers ranking
* should be run after every few seconds
*/
public function rank(){
$model=new Answer();
return $model->triggerRanking();
}
Testing
From the test data we generated earlier, the computations for a question id I randomly picked are:
Question Id 2 | |||||
---|---|---|---|---|---|
Upvotes | Downvotes | Total Votes | Rating | Bayesian Rating | Answer id |
9 | 9 | 18 | 0.5 | 1 | |
10 | 8 | 18 | 0.555555556 | 15 | |
11 | 7 | 18 | 0.611111111 | 22 | |
7 | 11 | 18 | 0.388888889 | 24 | |
7 | 12 | 19 | 0.368421053 | 31 | |
5 | 14 | 19 | 0.263157895 | 33 | |
12 | 6 | 18 | 0.666666667 | 34 | |
7 | 11 | 18 | 0.388888889 | 35 | |
9 | 9 | 18 | 0.5 | 37 | |
9 | 9 | 18 | 0.5 | 38 |
Run the app and access the URL (http://localhost:8000/api/rank) from Postman or in the browser to calculate the Bayesian average. The results are shown in the table below.
Question Id 2 | |||||
---|---|---|---|---|---|
Upvotes | Downvotes | Total Votes | Rating | Bayesian Rating | Answer id |
9 | 9 | 18 | 0.5 | 0.487063423 | 1 |
10 | 8 | 18 | 0.555555556 | 0.514687732 | 15 |
11 | 7 | 18 | 0.611111111 | 0.542312042 | 22 |
7 | 11 | 18 | 0.388888889 | 0.431814804 | 24 |
7 | 12 | 19 | 0.368421053 | 0.420206879 | 31 |
5 | 14 | 19 | 0.263157895 | 0.366443438 | 33 |
12 | 6 | 18 | 0.666666667 | 0.569936351 | 34 |
7 | 11 | 18 | 0.388888889 | 0.431814804 | 35 |
9 | 9 | 18 | 0.5 | 0.487063423 | 37 |
9 | 9 | 18 | 0.5 | 0.487063423 | 38 |
Conclusion
In production, you will have to create a cron job to hit the endpoint after every set duration of time. You can opt to add other metrics to the rating system to improve efficiency. For example, we can order answers by date created when Bayesian average is the same for two or more answers.
You can also opt to implement the rating system solely using a database. For example, in MySQL, you will have to create functions/procedures and triggers.
The complete source code for this tutorial can be found in Github.