1 million rows and SAINT still wants more

pic
Share this article

While this might be a quickie, it’s a biggy.  A big one in terms of the amount of data just uploaded through SAINT.  In fact, we’ve just uploaded around 1 million rows of data, with 6 columns per row.

And it didn’t even blink!  Gotta love that!

So why do we have a million rows of data?
Customer segmentation of course.

This was actually done for one of our other clients.

The rationale?

To segment conversions and transactions by customer type, segment, previous segment, needs group etc.  And SAINT enables that capability.

How?

Firstly create an eVar that the raw identifier will go into.  This might be an account number, a customer ID etc.  Then, using the admin, create the classifications on the eVar for the relative columns you need.  At this point I always create the classification hierarchy as well, just so I can envision how I want the data to be reported and drilled down though.

When you create the classifications, the SAINT file is also created and made available for download.

I opened the SAINT template in Excel and copied my customer segment data into it in blocks of 100,000 records.  There’s a number of reasons for this, not the least of which is to keep the file size down, but also to make it easier if an upload does fail – at least you can deal with 100,000 rows better than 1 million rows.

So I’ve now got 10 files, each file contains 100,000 rows and 6 columns of data per row.  Each file was about 5mb.

You can’t upload that much data through the browser, so you need to use the FTP Import capability.

In the SAINT admin, select Import File, click on the FTP Import and then Add New:

ftp_import

You’ll then get a popup that asks you to select a bunch of things to create an FTP account:

ftp_import_selection

Select the data to be classified, move the report suite or suites to the box on the right, select the import options and add in your email address.

Check the box and hit save.

A new FTP account has just been created on the Omniture servers and you’ll get a confirmation screen showing the address, username and password.

Open it up in an FTP client and upload your SAINT files to the FTP server.

You’re not quite done yet though.

You also need to create a series of empty files, with a .fin extension, named exactly the same as your SAINT files.  These are “finish” files and are crucial to the upload.  They’re completely empty files – any text editor can create them.  Just make sure they are named exactly the same, case sensitive.

Upload those .fin files and you’re done.

Now, go have a coffee, have some lunch or dinner or whatever and come back later.

Progress

You can kind of check on progress by refreshing the FTP list of files.  Omniture removes the files from the FTP directory when it begins to process them, so you can kind of get an idea of where things are.

Time Frame

I uploaded the files around 4pm.

At 10:30pm I did a data extract by FTP of all data to see where it was up to…it was done.  Shortly thereafter, I got an email saying it was done, without any failures.

Easy as pie.  No muss no fuss.

While we’re using customer segments, it could just have easily been customer demographics, technographics or any other form of data.  The point is, 1 million rows and it didn’t even blink.

There’s a few things to watch out for though when importing that much data.

There is a limit on the amount of unique values (500,000) that will be reported against in a given month.  We’re ok – we won’t see that limit.

Recommendations are that file sizes be kept under 30mb for the initial load, and then subsequent refreshes less than 5mb.  So we’re still ok.

And the import time will vary depending on many things, including how busy their import routines are.  You get in the queue and everyone loves a queue.

But that was it.  1 million rows of customer data now available for segmentation nirvana in SiteCatalyst – and DataWarehouse, and Discover, and Test and Target.  We’re off to the races!

And while the first run of this was a manual run, future updates can easily be automated now that the FTP site is created.  Just remember your .tab and .fin files must be named the same.


Share this article