DQL vs Doctrine Query Builder


In this final video in this series we are going to discuss using Doctrine's Query Builder, and DQL (Doctrine Query Language) as ways of working more efficiently with related data.

To begin with, we are going to expand the Reddit Scraper code to enable scraping more than just the first results page returned from Reddit's API. The code I am using to do this is not the most optimal, or best implementation of a web scraper ever written. Far from it. But it does the job, and as this is not a tutorial on builder a scraper, it's as good as it needs to be :)

<?php

// src/AppBundle/Service/RedditScraper.php

namespace AppBundle\Service;

use AppBundle\Entity\RedditAuthor;
use AppBundle\Entity\RedditPost;
use Doctrine\ORM\EntityManagerInterface;

class RedditScraper
{
    /**
     * @var EntityManagerInterface
     */
    private $em;

    public function __construct(EntityManagerInterface $em)
    {
        $this->em = $em;
    }

    public function scrape()
    {
        $client = new \GuzzleHttp\Client();
        $after = null;

        for ($x=0; $x<5; $x++) {
            $response = $client->request('GET', 'https://api.reddit.com/r/php.json?limit=25&after=' . $after);

            $contents[$x] = json_decode($response->getBody()->getContents(), true);

            $after = $contents[$x]['data']['after'];
        }

        foreach ($contents as $content) {
            foreach ($content['data']['children'] as $child) {

                $redditPost = new RedditPost();
                $redditPost->setTitle($child['data']['title']);

                $authorName = $child['data']['author'];

                $redditAuthor = $this->em->getRepository('AppBundle:RedditAuthor')->findOneBy([
                    'name' => $authorName
                ]);

                if (!$redditAuthor) {
                    $redditAuthor = new RedditAuthor();
                    $redditAuthor->setName($authorName);

                    $this->em->persist($redditAuthor);
                    $this->em->flush();
                }

                $redditAuthor->addPost($redditPost);

                $this->em->persist($redditPost);
            }
        }

        $this->em->flush();
    }
}

Running the scraper - assuming you don't hit upon timeouts from the Reddit API - should populate your database with enough results to illustrate the next stages of this tutorial.

Once you have created / updated / captured the data that your system is interested in, usually you want to display it in some form. This may be via a Twig template, or by returning the result set (or part of it) as a JSON feed, or the myriad other ways that data can be presented.

It is during the data retrieval phase that - if you are not careful - you can end up hammering your database every single time you want to display what you most likely consider 'basic' information.

Let me demonstrate this with an example.

In our system so far we have RedditPost entities which belong to RedditAuthor entities. Let's pretend we want to display a really basic clone of any standard subreddit Reddit page:

<!-- app/Resources/views/reddit/index.html.twig -->

{% extends '::base.html.twig' %}

{% block body %}

    <ul>
        {% for post in posts %}
        <li>{{ post.title }}</li>
        {% endfor %}
    </ul>

{% endblock %}

Writing the query for this is trivial.

// src/AppBundle/Controller/RedditController.php

class RedditController extends Controller
{
    /**
     * @Route("/", name="list")
     */
    public function listAction()
    {
        return $this->render(':reddit:index.html.twig', [
            'posts' => $this->getDoctrine()->getRepository('AppBundle\Entity\RedditPost')->findAll()
        ]);
    }

This works well. We end up querying our database once, and that is sufficient to get back all the results we want.

But, lets say that we decide the front end / Twig template needs changing. We want to include the RedditAuthor's name also. After all, we captured this information, so why not display it?

Let's quickly update the relevant parts:

<!-- app/Resources/views/reddit/index.html.twig -->
{% block body %}
    <ul>
        {% for post in posts %}
        <li>{{ post.title }} - {{ posts.author.name }}</li>
        {% endfor %}
    </ul>
{% endblock %}

Simple enough. Refresh the front end / list page. But what's this? Suddenly we have 100s of queries. What the?

Well, as it turns out, using the findAll method is doing only as much as we have told it too. We are telling Doctrine to find all RedditPost entities. Which it will, and which it has.

Inside our template however, we are saying... actually, can you just grab me the author's name for each of these posts?

Doctrine's agreeable with this - but the operation isn't free. We only pulled back everything from the RedditPost entity in the findAll query, which doesn't include the related data.

This is really important.

Notice we didn't actually break anything by asking for more data than we had originally queried for. This is powerful, but has some drawbacks which are perhaps not immediately obvious.

Any data we didn't ask for in our originally query for has to be queried for separately. Ouch.

This is a double edged sword.

It's cool that we pull in additional data as needed / on demand. I mean, this is a really, really nice feature.

But it's bad because it enables us to write some really inefficient code - sometimes without even realising we are doing so.

Now, assuming you are working in an environment in which it is enabled (think: app_dev), then Symfony has the web debug toolbar. The web debug toolbar has many useful purposes, one of which is that it will give us a count of all the queries executed to render the current page. You can even click on this query count to show you every single one of those queries, explain the query, and show the runnable query... which is frankly awesome.

Assuming you have the web debug toolbar available, you will get an early warning as to potentially sub-optimal query usage. This comes in the form of a yellow background on the query section of the web debug toolbar if more than 50 queries were run to execute the current page.

This still requires you to be doing more than 49 queries on any given page AND be running the web debug toolbar to easily notice the problem.

When To Use DQL vs Doctrine's Query Builder

There isn't an inbuilt helper query (think findAll, findBy, etc) for returning data efficiently from multiple entities. To do this, we must write our own query.

Frustratingly, as is often the case with code, there are multiple ways to write a query. The two ways I will discuss in this video / write up are using Doctrine's Query Builder, and DQL (Doctrine Query Language). But be aware you can also go with native SQL, if you absolutely must.

My immediate question when there exists multiple solutions to the same problem is: which do I use, when, and why?

Simply put, I tend to use DQL whenever possible, and only go for the query builder when parts of the query are conditional.

To better illustrate this, consider this example:

// a standard DQL query

$query = $this->em->createQuery(
    "
    SELECT p
    FROM AppBundle\Entity\RedditPost p
    "
);

$data = $query->getResult();

This is a really simple query - essentially select all from RedditPost.

But what if I wanted to add in a where clause?

// a standard DQL query with WHERE and a parameter

$query = $this->em->createQuery(
    "
    SELECT p
    FROM AppBundle\Entity\RedditPost p
    WHERE p.id > :id
    "
)->setParameter('id', 50);

$data = $query->getResult();

Ok, still simple enough.

But what if I wanted to only add in the where clause some of the time?

// it is still DQL, but now it is painful

var $someConditional = false;

if ($someConditional === true) {
    $query = $this->em->createQuery(
        "
        SELECT p
        FROM AppBundle\Entity\RedditPost p
        WHERE p.id > :id
        "
    )->setParameter('id', 50);
} else {
    $query = $this->em->createQuery(
    "
        SELECT p
        FROM AppBundle\Entity\RedditPost p
    "
    );
}

$data = $query->getResult();

Eughhh.

Instead, this would be much better suited to using the Doctrine Query Builder:

// using Doctrine's Query Builder

var $someConditional = false;

$query = $this->getDoctrine()->getRepository('AppBundle:RedditPost')->createQueryBuilder('p');

if ($someConditional === true) {
    $query
        ->where('p.id > :id')
        ->setParameter('id', 50)
    ;
} 

$data = $query->getQuery()->getResult();

Both the Query Builder and DQL give you the same outcome, but differ in use case. I tend towards DQL unless I need to conditionally include / exclude parts of the query - and in that case I'd use the Query Builder.

Writing Our Own Query

To fix the original problem where we had many queries simply to return a related field, we need write a custom query. As above, we can do this using either DQL or the Query Builder.

To get back the data in one result, we need to join the two objects together. You can think of this exactly like doing a join in SQL-land:

DQL Approach

Writing a join in DQL is really quite straightforward. It looks almost identical to a SQL join:

$query = $this->em->createQuery(
"
    SELECT p, a
    FROM AppBundle\Entity\RedditPost p
    JOIN p.author a
"
);

$data = $query->getResult();

The hardest part is remembering to add in all the p and a references. At least, that's what I usually forget to do.

The Doctrine Query Builder Approach

Writing a join in Doctrine's Query Builder is easy enough... once you know the syntax. If you don't know the syntax, this is not very intuitive at all, in my opinion:

$query = $this->getDoctrine()
    ->getRepository('AppBundle:RedditPost')
    ->createQueryBuilder('p')
    ->join('p.author', 'a')
    ->addSelect('a')
    ->getQuery()
;

$data = $query->getResult();

For me, this is nowhere near as obvious as writing the DQL equivalent.

It makes sense when you break it down:

$query = $this->getDoctrine()
    ->getRepository('AppBundle:RedditPost')
    ->createQueryBuilder('p')

createQueryBuilder('p') implicitly creates the equivalent:

    SELECT p
    FROM AppBundle\Entity\RedditPost p

Which is fine. This is why we don't need an explicit select or from in our query builder.

What is less obvious - at first glance - is that we won't get back the joined data, unless we explicitly ask for it, by way of an addSelect statement.

Ok, this is a little more advanced than what we have been working with up until now - and be sure to watch the video for a more in-depth explanation of what is happening and why.

As discussed however, it is important to have an understanding of which tool to use, and why. It is also important to understand that once you are working with related data, you really must start writing your own queries.

Limit And Offset

One further point to cover is the use limits, and offsets.

Often in SQL you would want to limit the number of returned results. Using an offset is a little less common, but very useful for pagination. Even so, they both come under the same sort of area.

What is not so obvious upfront is that you can't add a limit or offset to a DQL query. Instead, both are set the same way:

$query = $this->getDoctrine()
    ->getRepository('AppBundle:RedditPost')
    ->createQueryBuilder('p')
    ->getQuery()
;

// or

$query = $this->em->createQuery(
"
    SELECT p
    FROM AppBundle\Entity\RedditPost p
"
)->getQuery();

// then

$posts = $query
    ->setFirstResult(45)
    ->setMaxResults(5)
    ->getResult()
;

In Doctrine terminology, the limit is setMaxResults, and the offset is setFirstResult. More on this here.

Doctrine relationships become very important as we start to work with Symfony's form, which is exactly what we shall do in the very next series.

Code For This Course

Get the code for this course.

Episodes