Got a tip? Let us know

                 


Upvote/Downvote Weighted Ranking System in Laravel Using Bayesian Formula

March 06,2019 2 comments

Recently, I was developing a Q & A application that required me to rate answers based on the number of likes and dislikes received. The plan was to ensure that answers with the most likes and least dislikes appeared at the top. This got me thinking. How could I make the ranking system efficient, unbiased and dynamic? After a few hours of research, I bumped into an article  by Markus Weichselbaum on the Bayesian rating that saved me a great deal. Spare a few minutes and go through the article to understand what Bayesian rating is and why it is the method of choice.

In this tutorial, we are going to discuss how to implement the backend for the weighted rating system in Laravel using the Bayesian formula. No background in Laravel is required as one can easily copy the code and use it in any other PHP project. We will create a simple Question Answer application to demonstrate the concept.

Creating Migrations and Models

We need to create models and migrations for the four tables we will use in this tutorial. The four tables are questions, answers, users and votes. In Laravel ,authentication is very simple. Everything is configured to work out of the box. In the console, run the following command:

php artisan make:auth

This command will create the authentication views, a user model, the routes for the authentication endpoints as well as the migration for users table.

Not using Laravel? No problem. Below is the users' table schema:

 
CREATE TABLE `users` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `email` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `email_verified_at` timestamp NULL DEFAULT NULL,
  `password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `remember_token` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Next, let’s create the models and migrations for the other three tables using the commands below:

php artisan make:model Question -m

php artisan make:model Answer -m

php artisan make:model Vote -m

The commands will create Question, Answer and Vote models respectively. The -m option will automatically create a migration for us eliminating the need to run the “create migration” command.

Update database/migrations/xxxx_xx_xx_xxxxxx_reate_questions_table.php file and add three fields:


public function up(){
   Schema::create('questions', function (Blueprint $table) {
    $table->increments('id');
    $table->string('question');
    $table->integer('is_deleted')->default(0);
    $table->text('slug', 65535);
    $table->timestamps();
   });
}

Questions table schema:


CREATE TABLE `questions` (
  `id` int(10) UNSIGNED NOT NULL,
  `question` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `is_deleted` int(11) NOT NULL DEFAULT '0',
  `slug` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Update database/migrations/xxxx_xx_xx_xxxxxx_create_answers_table.php file and add seven fields:

Schema::create('answers', function (Blueprint $table) {
   $table->increments('id');
$table->text('answer', 65535)->nullable();
$table->integer('is_deleted')->default(0);
$table->integer('question_id')->nullable();
$table->integer('rank')->default(1);
$table->integer('user_id')->unsigned()->nullable()->default(0);
$table->float('bayesian_rating', 10, 0)->nullable();
$table->integer('must_update')->default(1);
   $table->timestamps();
});

Answers table schema:


CREATE TABLE `answers` (
  `id` int(10) UNSIGNED NOT NULL,
  `answer` text COLLATE utf8mb4_unicode_ci,
  `is_deleted` int(11) NOT NULL DEFAULT '0',
  `question_id` int(11) DEFAULT NULL,
  `rank` int(11) NOT NULL DEFAULT '1',
  `user_id` int(10) UNSIGNED DEFAULT '0',
  `bayesian_rating` double DEFAULT NULL,
  `must_update` int(11) NOT NULL DEFAULT '1',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Update database/migrations/xxxx_xx_xx_xxxxxx_create_votes_table.php file and add four fields:


public function up()
{
   Schema::create('votes', function (Blueprint $table) {
       $table->increments('id');
    $table->integer('vote')->nullable();
    $table->integer('user_id')->nullable();
    $table->integer('answer_id')->nullable();
    $table->integer('is_deleted')->default(0);
    $table->timestamps();
   });
}

Votes table schema:


CREATE TABLE `votes` (
  `id` int(10) UNSIGNED NOT NULL,
  `vote` int(11) DEFAULT NULL,
  `user_id` int(11) DEFAULT NULL,
  `answer_id` int(11) DEFAULT NULL,
  `is_deleted` int(11) NOT NULL DEFAULT '0',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Now we can migrate our database using artisan command:

php artisan migrate

Seeding the Database

Next, we need to seed our tables. In seeds/DatabaseSeeder.php update run() function:


use App\Answer;
use App\Question;
use App\User;
use App\Vote;
use Faker\Factory as Faker;
use Illuminate\Database\Seeder;
use Illuminate\Support\Str;

class DatabaseSeeder extends Seeder {
  /* Seed the application's database.
   *
   * @return void
   */
  public function run() {
     // use the faker library to generate fake data
     $faker = Faker::create();

     // create 20 users
     for ( $i = 0; $i < 20; $i ++ ) {
        User::create( [
           'name'              => $faker->name,
           'email'             => $faker->unique()->safeEmail,
           'email_verified_at' => now(),
           'password'          => '$2y$10$TKh8H1.PfQx37YgCzwiKb.KjNyWgaHb9cbcoQgdIVFlYg7B77UdFm', // secret
           'remember_token'    => Str::random( 10 ),
        ] );
     }
     $users = User::all()->pluck( 'id' )->toArray();

     // create 5 questions
     for ( $i = 0; $i < 5; $i ++ ) {
        $question_value = $faker->sentence( 6 );
        $question       = Question::create( [
           'question' => $question_value,
           'slug'     => str_slug( $question_value ),
        ] );

     }
     $questions = Question::all()->pluck( 'id' )->toArray();
     // each question to have 8 answers
     foreach ($questions as $question){
        for ( $i = 0; $i < 8; $i ++ ) {
           $answer = Answer::create( [
              'answer'      => $faker->paragraph( 6 ),
              'question_id' => $faker->randomElement( $questions ),
              'user_id'     => $question,
           ] );
        }
     }
     // a question to have either 6 votes or 12 votes or 18 votes.
     // A vote can either be a upvote or a downvote i.e 0 or 1
     $votes       = $faker->randomElement( [ 6, 12, 18 ] );
     $users_clone = $users;
     $answers = Answer::all()->pluck( 'id' )->toArray();
     foreach ($answers as $answer){
        for ( $i = 0; $i < $votes; $i ++ ) {
           $del_val = $faker->randomElement( $users );
           Vote::create( [
              'vote'      => $faker->randomElement( [ 0, 1 ] ),
              'answer_id' => $answer,
              'user_id'   => $del_val,
           ] );
           if ( ( $key = array_search( $del_val, $users_clone ) ) !== false ) {
              unset( $users_clone[ $key ] );
           }
        }
     }
     for ( $i = 0; $i < $votes; $i ++ ) {
        $del_val = $faker->randomElement( $users );
        Vote::create( [
           'vote'      => $faker->randomElement( [ 0, 1 ] ),
           'answer_id' => $faker->randomElement( $answers ),
           'user_id'   => $del_val,
        ] );
        if ( ( $key = array_search( $del_val, $users_clone ) ) !== false ) {
           unset( $users_clone[ $key ] );
        }
     }
  }
}

We are using the Faker library included in Laravel to quickly generate data. The seeder generates fake testing data including the foreign key values. Seed our database with the artisan command:

php artisan db:seed

Creating the Cronjob Controller

Next, let’s create a controller that will process the requests. In your console, run the artisan command:

php artisan make:controller CronjobController

Then add an API route in routes/api.php to point to rank() method in CronjobController.


Route::get( 'rank', array( 'as' => 'rank', 'uses' => 'CronjobController@rank' ) );

Creating Relationships

Next up, we need to define relationships between the models. Questions and Answers tables have one to many relations. One question can have many answers. The relationship is defined by placing a function on the Question model:


// app/Question.php                       --path      

/**
* Get answers associated with a question
*/
public function answers() {
  return $this->hasMany( 'App\Answer' );
}

Similarly, Answers and Votes tables have one to many relationship. The relationship is defined by placing a function on the Answer model:


// app/Answer.php                      

/**
* Get votes associated with an answer
*/
public function votes() {
  return $this->hasMany( 'App\Vote' );
}

Creating Bayesian Calculation Methods

First, let’s create a function to pull active answers associated with a particular question passed as an argument. In Answer model add the function:


/**
* Used in calculating Bayesian rating
*
* @param $question_id
* @param $order_by
* @param $order
*
* @return mixed
*/
public function getAnswers( $question_id, $order_by = "id", $order = "asc" ) {
 
  $answers = Question::find( $question_id )->answers()->where( 'is_deleted', 0 )
                     ->orderBy( $order_by, $order )->get();

  return $answers;
}

Equivalent query:


select * from `answers` where `answers`.`question_id` = 1 and `answers`.`question_id` is not null and `is_deletedy` = 0 order by `id` asc

Next, create methods to calculate an answer’s upvotes, downvotes, and total votes (sum of upvotes and downvotes) and average total votes:


// app/Answer.php 

/**
* Total up votes
* @param $id integer question_id
* @return mixed
*/
public function getUpvotes( $id = null ) {
  $data       = 0;
  $id         = isset( $this->id ) ? $this->id : $id;
  $answers = Answer::find( $id );
  if ( $answers && count( $answers->votes ) ) {
     $data = $answers->votes()->where( 'is_deleted', 0 )->where( 'vote', 1 )->count();
  }

  return $data;
}

Equivalent query:


select count(*) as aggregate from `votes` where `votes`.`answer_id` = 2 and `votes`.`answer_id` is not null and `is_deleted` = 0 and `vote` = 1

/**
* Total down votes
* @param $id integer question_id
* @return mixed
*/
public function getDownvotes( $id = null ) {
  $data       = 0;
  $id         = isset( $this->id ) ? $this->id : $id;
  $answers = Answer::find( $id );
  if ( $answers && count( $answers->votes ) ) {
     $data = $answers->votes()->where( 'is_deleted', 0 )->where( 'vote', 0 )->count();
  }

  return $data;
}

Equivalent query:


select count(*) as aggregate from `votes` where `votes`.`answer_id` = 2 and `votes`.`answer_id` is not null and `is_deleted` = 0 and `vote` = 0

/**
* Sum of up votes and down votes
* @param $id
*
* @return float|int
*/
public function getVotes( $id ) {
  $total = $this->getUpvotes( $id ) + $this->getDownvotes( $id );

  return $total;
}

/**
* Average total votes
* @param $answers object
*
* @return float|int
*/
public function getAverageVotes( $answers ) {
  $average = 0;
  $sum     = 0;
  $count   = 0;
  foreach ( $answers as $answer ) {
     $sum += $this->getVotes( $answer->id );
     $count ++;
  }
  if ( $sum > 0 ) {
     $average = $sum / $count;
  }

  return $average;
}

We also need to implement methods to calculate an answer’s rating (upvotes divided by total votes) and the average rating of all answers to a question:


/**
* Upvotes divided by the sum of upvotes and downvotes
* @param $id
*
* @return float|int
*/
public function getRating( $id ) {
  $value = 0;
  $total = $this->getVotes( $id );
  if ( $total > 0 ) {
     $value = $this->getUpvotes( $id ) / $total;
  }

  return $value;
}

/**
* Average rating
* @param $definitions object
*
* @return float|int
*/
public function getAverageRating( $definitions ) {
  $average = 0;
  $sum     = 0;
  $count   = 0;
  foreach ( $definitions as $definition ) {
     $sum += $this->getRating( $definition->id );
     $count ++;
  }
  if ( $sum > 0 ) {
     $average = $sum / $count;
  }

  return $average;
}

That is all we need to calculate the Bayesian rating. Let’s put all the methods together and calculate the rating as is shown below:


/**
* Get Bayesian rating
*
* @param $answers object
*
* @return array
*/
public function getBayesianRating( $answers ) {
  foreach ( $answers as $answer ) {
     $votes                   = $this->getVotes( $answer->id );
     $average_votes           = $this->getAverageVotes( $answers );
     $numerator               = ( $average_votes * $this->getAverageRating( $answers ) ) +
                                ( $votes * $this->getRating( $answer->id ) );
     $denominator             = $average_votes + $votes;
     $bayesian_rating         = ( $denominator > 0 ) ? $numerator / $denominator : 0;
     $answer->bayesian_rating = $bayesian_rating;
     $answer->must_update     = 0;
     $answer->save();
  }
  if ( $answers ) {
     $count         = 1;
     $answers_1 = $this->getAnswers( $answers[0]->question_id, 'bayesian_rating', 'desc' );
     foreach ( $answers_1 as $value ) {
        $value->rank = $count;
        $value->save();
        $count ++;
     }
  }
}

The method calculates the Bayesian rating, updates the bayesian_rating column in answers table, updates the must_update column to 0 then finally, it loops through the updated answers and updates the rank column in ascending order starting from 1 based on the value in the bayesian_rating column.

The above method will only rank answers for a single question. We need to create a function that checks for all questions whose answers have a must update status and ranks all of them as shown in the snippet:


/**
* @return string
*/
public function triggerRanking() {
  $question_ids = Answer::select( 'question_id' )->where( 'must_update', 1 )->where( 'is_deleted', 0 )
                        ->distinct( 'question_id' )->get();
  foreach ( $question_ids as $question_id ) {
        $answers = $this->getAnswers( $question_id->question_id);
        if ( $answers ) {
           $this->getBayesianRating( $answers );
        }
  }

  return "success";
}

NOTE:

Remember to always set the must_update column in answers table to 1 whenever an answer is upvoted or downvoted. The first time you hit the endpoint calling the function triggerRanking(), all the answers will be ranked since the must_update column is set to 1 by default. However, subsequent hits will only affect answers whose votes have changed.

Finally, let's create an endpont in CronjobController and call the triggerRanking() function:


/**
* cron job to update answers ranking
* should be run after every few seconds
*/
  public function rank(){

  $model=new Answer();
  return $model->triggerRanking();
}

Testing

From the test data we generated earlier, the computations for a question id I randomly picked are:

Question Id 2
Upvotes Downvotes Total Votes Rating Bayesian Rating Answer id
9 9 18 0.5   1
10 8 18 0.555555556   15
11 7 18 0.611111111   22
7 11 18 0.388888889   24
7 12 19 0.368421053   31
5 14 19 0.263157895   33
12 6 18 0.666666667   34
7 11 18 0.388888889   35
9 9 18 0.5   37
9 9 18 0.5   38

Run the app and access the URL (http://localhost:8000/api/rank) from Postman or in the browser to calculate the Bayesian average. The results are shown in the table below.

Question Id 2
Upvotes Downvotes Total Votes Rating Bayesian Rating Answer id
9 9 18 0.5 0.487063423 1
10 8 18 0.555555556 0.514687732 15
11 7 18 0.611111111 0.542312042 22
7 11 18 0.388888889 0.431814804 24
7 12 19 0.368421053 0.420206879 31
5 14 19 0.263157895 0.366443438 33
12 6 18 0.666666667 0.569936351 34
7 11 18 0.388888889 0.431814804 35
9 9 18 0.5 0.487063423 37
9 9 18 0.5 0.487063423 38

Conclusion

In production, you will have to create a cron job to hit the endpoint after every set duration of time. You can opt to add other metrics to the rating system to improve efficiency. For example, we can order answers by date created when Bayesian average is the same for two or more answers.

You can also opt to implement the rating system solely using a database. For example, in MySQL, you will have to create functions/procedures and triggers.

The complete source code for this tutorial can be found in Github.



More in this category: How to install apps from external sources in your smart android TV »


Comments

Peter - Mar 07,2019 at 09:24 am
Awesome tutorial.I was looking for something like this..
Ngeene - Mar 11,2019 at 07:58 am
this is a really great piece. it's nice to see the Kenyan tech circle growing. Keep up the good work.

Add Comment