Drupal node_save Limitations?

Once I got my simple coinbank module working, I decided to dive into another task that has been in the back of my mind for a couple years. I run another Web site for my family that is going to be used for news about events, such as our family reunion. My mother has also been doing a lot of genealogy work and approached me sometime ago about getting some of it published at the site.

So, I took a look around drupal.org and found a module that would presumably handle this for me. It's called Family Tree 2. Unfortunately, it's somewhat buggy for the 6.x version and doesn't seem to operate as one would expect. I decided that I would try my hand at fixing it.

Well, that didn't work out so well. Next, I decided I would write my own "API" type module that only processed the GEDCOM files into serialized arrays of records and stores that data in separate tables dedicated to the record type. This is pretty solid. I can process my mother's GEDCOM file of almost 2,000 records. The problem comes when I try to leverage the Family Tree 2 module to do the actual node creation.

With my smaller test GEDCOM file (about 12 records: 5 INDI and 2 FAM, which are all Family Tree 2 cares about for now) everything works perfectly. This means that my record tables are perfect AND the nodes get created for each INDI and FAM, as well as handling the linkage of INDIs to the "ancestor group" or FAM.

When I try to process the "big one" which has approximately 1,500 INDIs and 500 FAMs, things break down horribly. My piece works well, since all the records are snug in my tables. When I start looping through them to create the records however (simple modification to Family Tree 2's foreach loop), once about 300 or so INDI records are processed, Drupal times out. I get varying files that are supposedly having problems...usually some Drupal core include file. It can be different each time. Each number of successful processing is also different: 307, 311, 302, etc.

How do you Drupal Masters use 'node_save' to positive effect when you need to process a LOT of nodes?

Comments

PHP.ini or Batch API

If its one big import that you need to run once.. you can try to increase PHP's memory limit and execution time. If your host allows, you can change this in an .htaccess or even with an ini_set() I believe.

If its lots of individual node_saves, you could break them up and give the Batch API a try. The documentation is not great but I have been playing around with it lately and have found some great uses for it. The Batch API is a set of helper functions that help set up a queue of actions and then can output the progress of those actions to a javascript enabled page.

Wow...

I guess I need to see what I can do about being notified when a comment is queued for approval, this one had been sitting there for a few weeks :D

Anyway, I was eventually able to get the Batch API to work. I had to heavily modify family to do it, but I did manage it. The project has been on hold for some weeks while I do my RealJob™, but I'm pretty stoked about it.