Importing CSV Data The Easy Way


In this video we are continuing on with the creation of a Symfony console command which allows the importing of a CSV file. As mentioned in the previous video, this implementation is very basic and illustrates "a process", not "the process".

By the end of this video you will have learned how to use the league/csv data manipulation library to import a CSV file stored on your server's hard drive, and used this data to populate two related Doctrine entities.

Getting started we need to grab the league/csv library as a dependency of our project. Doing this is simple enough:

composer require league/csv:^8.0

At the time of recording, I was using version 8.2 of the library. You may receive a newer version when installing, though the fundamentals should remain the same. Further installation instructions can be found here, though most likely we are done with 'installation' at this point.

Reading A CSV File in Symfony

As mentioned in the previous video, there are likely many ways to read a CSV file in Symfony. Using league/csv is one way. It's also an incredibly easy way, so that's why I like it.

In reality, where the location of your CSV may be is difficult for me to say. For the purposes of this video I am storing it inside /src/AppBundle/Data/MOCK_DATA.csv.

This is not a great location. Where best to store these files depends entirely on your applications circumstances, such as whether the CSV file is uploaded by you, or uploaded using some web form / front end, whether the system is accessed only internally, or by others, etc, etc. Use your own best judgement, or refer to StackOverflow :) if unsure.

We will start by reading the file from disk:

<?php

// /src/AppBundle/Command/CsvImportCommand.php

namespace AppBundle\Command;

use AppBundle\Entity\Athlete;
use AppBundle\Entity\Competitor;
use Doctrine\ORM\EntityManagerInterface;
use League\Csv\Reader;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

/**
 * Class CsvImportCommand
 * @package AppBundle\ConsoleCommand
 */
class CsvImportCommand extends Command
{
    // * snip *

    /**
     * @param InputInterface  $input
     * @param OutputInterface $output
     *
     * @return void
     * @throws \Doctrine\ORM\NonUniqueResultException
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $io = new SymfonyStyle($input, $output);
        $io->title('Attempting import of Feed...');

        $reader = Reader::createFromPath('%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv');

Most of this is code we saw towards the end of the previous video.

New here is the inclusion of the line:

use League\Csv\Reader;

And the more obvious $reader = Reader....

A potential point of confusion may be the use of %kernel.root_dir%. Any time you see text wrapped in percentage signs in Symfony you are working with a parameter. You can - and will - define parameters in parameters.yml. You very likely needed to do this step either as part of the original composer install, or by manually updating the file to get your database connectivity working.

However, %kernel.root_dir% along with a bunch of other %kernel.{other_things}% are special 'global' parameters. What this means is that you don't manually define them, you get them for free. What this also means is that they are hard to track down (including in the official documentation, bizarrely) and therefore potentially confusing to understand.

Ok, so %kernel.root_dir% is by far the most used of these parameters. You will see it all over Symfony. What it refers to is your app directory, which is a little odd at first as you might think a root_dir would be the root of your project - where the app, src, web, etc directories live.

But if you look inside the app directory, you will see AppKernel.php - aha! It starts to make sense.

This %kernel.root_dir% parameter becomes incredibly useful when you understand that it always points to this app path. You don't need to hardcode to a your specific directory. Instead, you can work relative to the location of the app directory.

Why all this hassle for one directory though? Well, if you think about it - the path you have your code on your local machine may be different to that of your colleague. And it almost certainly will be different to that when your code is ultimately deployed into production. This way we aren't tying ourselves down to a specific path.

Let's jump back to our code:

$reader = Reader::createFromPath('%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv');

And more specifically:

'%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv'

We now know that %kernel.root_dir% resolves to be /some/path/to/our/app directory.

We also know that if we have /some/path/to/our/app, then this being a Symfony app, we must also have:

  • /some/path/to/our/web
  • /some/path/to/our/var
  • /some/path/to/our/src

And so on.

The thing is, as %kernel.root_dir% points to the app directory, and we need to be in the src directory, we must therefore go 'up' and out of this app directory before we can get to the src directory. In other words, we need to go one level up, out of the app directory, back to the root of our site, and then traverse down in to src/AppBundle... etc.

That's all this line is doing. The ../ simply is the unix way of saying go up a directory.

Phew :)

One last thing to point out here - if you'd like to know of the other available kernel parameters then check out this part of the source. I had a dive into the official documentation but haven't found any solid references to the kernel parameters unfortunately. Do shout up if I am missing them somewhere!

Fetching Rows From A CSV File

We now have an instance of the Reader that is looking at our MOCK_DATA.csv file path.

Next, we can use fetchAssoc to return an array for each row in our CSV combining the first row - our 'header' information (first_name, data_of_birth, competition, etc), and then the current row's data. This method returns an Iterator, which will have some impact on one part of the code we end up writing. You needn't concern yourself with this beyond the superficial knowledge that you will receive an Iterator object, rather than an array. You can still foreach against it, but you won't encounter any "out of memory" issues if working with larger CSV files.

Knowing this, we can write a foreach loop to process each row, and create - and relate - the entities as required:

<?php

namespace AppBundle\Command;

use AppBundle\Entity\Athlete;
use AppBundle\Entity\Competitor;
use Doctrine\ORM\EntityManagerInterface;
use League\Csv\Reader;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

/**
 * Class CsvImportCommand
 * @package AppBundle\ConsoleCommand
 */
class CsvImportCommand extends Command
{
    /**
     * @var EntityManagerInterface
     */
    private $em;

    /**
     * CsvImportCommand constructor.
     *
     * @param EntityManagerInterface $em
     *
     * @throws \Symfony\Component\Console\Exception\LogicException
     */
    public function __construct(EntityManagerInterface $em)
    {
        parent::__construct();

        $this->em = $em;
    }

    /**
     * Configure
     * @throws \Symfony\Component\Console\Exception\InvalidArgumentException
     */
    protected function configure()
    {
        $this
            ->setName('csv:import')
            ->setDescription('Imports the mock CSV data file')
        ;
    }

    /**
     * @param InputInterface  $input
     * @param OutputInterface $output
     *
     * @return void
     * @throws \Doctrine\ORM\NonUniqueResultException
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $io = new SymfonyStyle($input, $output);

        $reader = Reader::createFromPath('%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv');

        // https://github.com/thephpleague/csv/issues/208
        $results = $reader->fetchAssoc();

        foreach ($results as $row) {

            // create new athlete
            $athlete = (new Athlete())
                ->setFirstName($row['first_name'])
                ->setLastName($row['last_name'])
                ->setDateOfBirth(new \DateTime($row['date_of_birth']))
                ->setWeight($row['weight'])
            ;

            $this->em->persist($athlete);

            // create new Competitor
            $competitor = (new Competitor())
                ->setCategory($row['category'])
                ->setCompetition($row['competition'])
            ;

            $this->em->persist($competitor);

            $athlete->setCompetitor($competitor);
        }

        $this->em->flush();

        $io->success('Command exited cleanly!');
    }
}

Note here how we now use the column headings from the CSV file:

first_name,last_name,date_of_birth,category,weight,competition
Jesse,Tucker,10/29/1939,category 1,150,competition 3
Phyllis,Price,10/24/1930,category 2,78,competition 3
Matthew,Butler,10/16/1903,category 2,87,competition 3

E.g. ->setFirstName($row['first_name']) will be replaced with Jesse for the first row, Phyllis for the second, and so on.

Running this should work.

But we end up with some problems.

Firstly, we have 1000 rows in both tables. As covered in the previous video, we actually only expect 9 rows at most in the second table.

Secondly, if we re-run the command, we would end up with a duplicate for each row in both tables. Urgh.

Stopping Duplicates

Ok so if following along and you've accidentally filled your database full of duplicates, the easiest way to fix this is to either drop the tables in your GUI and recreate your schema, or simply drop, recreate, and re-schema update your database:

php bin/console doctrine:database:drop --force
php bin/console doctrine:database:create
php bin/console doctrine:schema:update --force

# one liner for the lazy:

php bin/console doctrine:database:drop --force && php bin/console doctrine:database:create && php bin/console doctrine:schema:update --force

Now, let's add in some duplicate checking:

<?php

namespace AppBundle\Command;

use AppBundle\Entity\Athlete;
use AppBundle\Entity\Competitor;
use Doctrine\ORM\EntityManagerInterface;
use League\Csv\Reader;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

/**
 * Class CsvImportCommand
 * @package AppBundle\ConsoleCommand
 */
class CsvImportCommand extends Command
{
    /**
     * @var EntityManagerInterface
     */
    private $em;

    /**
     * CsvImportCommand constructor.
     *
     * @param EntityManagerInterface $em
     *
     * @throws \Symfony\Component\Console\Exception\LogicException
     */
    public function __construct(EntityManagerInterface $em)
    {
        parent::__construct();

        $this->em = $em;
    }

    /**
     * Configure
     * @throws \Symfony\Component\Console\Exception\InvalidArgumentException
     */
    protected function configure()
    {
        $this
            ->setName('csv:import')
            ->setDescription('Imports the mock CSV data file')
        ;
    }

    /**
     * @param InputInterface  $input
     * @param OutputInterface $output
     *
     * @return void
     * @throws \Doctrine\ORM\NonUniqueResultException
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $io = new SymfonyStyle($input, $output);

        $reader = Reader::createFromPath('%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv');

        // https://github.com/thephpleague/csv/issues/208
        $results = $reader->fetchAssoc();

        foreach ($results as $row) {

            // create new athlete
            $athlete = (new Athlete())
                ->setFirstName($row['first_name'])
                ->setLastName($row['last_name'])
                ->setDateOfBirth(new \DateTime($row['date_of_birth']))
            ;

            $this->em->persist($athlete);

            // do a lookup for existing Competitor matching some combination of fields
            $competitor = $this->em->getRepository('AppBundle:Competitor')
                ->findOneBy([
                    'category' => $row['category'],
                    'competition' => $row['competition']
                ]);

            if ($competitor === null) {
                // or create new Competitor
                $competitor = (new Competitor())
                    ->setCategory($row['category'])
                    ->setCompetition($row['competition'])
                    ->setWeight($row['weight'])
                ;

                $this->em->persist($competitor);

                $this->em->flush();
            }

            $athlete->setCompetitor($competitor);
        }

        $this->em->flush();

        $io->success('Command exited cleanly!');
    }
}

This time we first check if the Competitor already exists. If not, only then would we create one.

Note that if a Competitor is created then we do a flush here. This is to stop any potential issues with subsequent loops not yet seeing the Competitor in the database (as it would not yet have been written to the DB).

This works quite well, and should de-duplciate the Competitor table. This should mean we only end up with 9 rows as expected.

However, this findOneBy operation is not cheap. We would want to add an index covering competition and category columns to ensure the lookup was as fast as possible. Please see this video for further information on indexes.

We still would get duplicated Athlete entities at this point, so I will leave it as an exercise for you to add in a similar check for Athlete entities, should you be so inclined.

Adding A Progress Bar

Sure, we've done the hard part - and hopefully you can see now it's not so hard if you break it down into manageable steps - so now let's make it a little more stylish.

We used the SymfonyStyle earlier. As part of this we get access to a cool progress bar, so let's add that in:

<?php

namespace AppBundle\Command;

use AppBundle\Entity\Athlete;
use AppBundle\Entity\Competitor;
use Doctrine\ORM\EntityManagerInterface;
use League\Csv\Reader;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

/**
 * Class CsvImportCommand
 * @package AppBundle\ConsoleCommand
 */
class CsvImportCommand extends Command
{
    /**
     * @var EntityManagerInterface
     */
    private $em;

    /**
     * CsvImportCommand constructor.
     *
     * @param EntityManagerInterface $em
     *
     * @throws \Symfony\Component\Console\Exception\LogicException
     */
    public function __construct(EntityManagerInterface $em)
    {
        parent::__construct();

        $this->em = $em;
    }

    /**
     * Configure
     * @throws \Symfony\Component\Console\Exception\InvalidArgumentException
     */
    protected function configure()
    {
        $this
            ->setName('csv:import')
            ->setDescription('Imports the mock CSV data file')
        ;
    }

    /**
     * @param InputInterface  $input
     * @param OutputInterface $output
     *
     * @return void
     * @throws \Doctrine\ORM\NonUniqueResultException
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $io = new SymfonyStyle($input, $output);
        $io->title('Attempting import of Feed...');

        $reader = Reader::createFromPath('%kernel.root_dir%/../src/AppBundle/Data/MOCK_DATA.csv');

        // https://github.com/thephpleague/csv/issues/208
        $results = $reader->fetchAssoc();

        $io->progressStart(iterator_count($results));

        foreach ($results as $row) {

            // do a look up for existing Athlete matching first + last + dob
            // or create new athlete
            $athlete = (new Athlete())
                ->setFirstName($row['first_name'])
                ->setLastName($row['last_name'])
                ->setDateOfBirth(new \DateTime($row['date_of_birth']))
            ;

            $this->em->persist($athlete);

            // do a lookup for existing Competitor matching some combination of fields
            $competitor = $this->em->getRepository('AppBundle:Competitor')
                ->findOneBy([
                    'category' => $row['category'],
                    'competition' => $row['competition']
                ]);

            if ($competitor === null) {
                // or create new Competitor
                $competitor = (new Competitor())
                    ->setCategory($row['category'])
                    ->setCompetition($row['competition'])
                    ->setWeight($row['weight'])
                ;

                $this->em->persist($competitor);
            }

            $athlete->setCompetitor($competitor);

            $io->progressAdvance();
        }

        $this->em->flush();

        $io->progressFinish();
        $io->success('Command exited cleanly!');
    }
}

Most of this should be fairly self explanatory.

We must tell the progressStart method how many entries are in our progress count so it can properly display the progress bar. If we don't, it will display a really weird bar that jumps up and down and looks pretty lame. This is much more evident on larger datasets.

As we are working with an Iterator, we must use iterator_count instead of count.

At this stage we have a working implementation with some nice 'styling'.

In all likelihood we would want to extract most of this out into separate services, as already this command is doing rather too much. However, as this is a simple prototype / demonstration, I am going to leave this here.

Hopefully you have found this useful, but as ever, if you have any questions, comments, or feedback please do leave them in box below this, or any other relevant video, and I will do my very best to help you.

Episodes

# Title Duration
1 Setup and Manual Implementation 04:43
2 Importing CSV Data The Easy Way 04:44