- Article Type: General
- Product: Primo
- Product Version: 4
* The Primo pipe stopped with a pipe error. The Harvest Log shows this error:
2012-03-13 12:21:13,737 ERROR [t-SplitRecords] [c-TarGZReader] - Record too large. record size is: 404,112,200 max allowed record size: 314,572,800
* The problem file on the server is not the size described in the error (404,112,200). The file size is 23,929,419:
primo exlibris 23929419 Mar 13 06:14 sfx_primo_export.xml-marc.tar.gz (22.82mb)
* Data extract file is too large to be handled by Primo's file splitter
* The size of the raw file is the problem, NOT the zipped or gz'd file
Split the single large file into smaller files using a 3rd-party tool.
"split" is a common tool and may already be available on your server.
Step 1 Run the split command (counter, "aa" "ab" etc, will be appended to the newFilePrefix), for example:
>> split -b 300000000 fromFile.mrc tonewFilePrefix
This will create, for example:
Step 2 Combine the multiple files into 1 package (such as, .tar or .tar.gz), for example:
>> tar -cvzf totargz.tar.gz tonewFilePrefix*
This command will create a single tar-gziped file which Primo will be able to process:
An alternative to splitting the files could be to change the Advanced Configuration > General Configuration Wizard > primo.process.MaxUncompressedFileSize (maximum file size). This will allow Primo's file splitter to handle a larger file. We do NOT recommend this solution for 2 reasons:
 File loading is done completely in the memory (RAM). This means that the Back Office must have sufficient memory allocated to process such the larger file.
 Your input file can be expected to continue growing and a size that works this month may no longer work next month. While setting up a file split solution may initially be more complicated, it is a more stable solution for the long term.
This should not be a problem coming from Alma, Aleph, or Voyager, because those systems split files to handle this limitation.
Files provided by other, external systems, such as other ILSs may have this problem, too.
It is best if properly-sized files are generated by the source system, so that a record at the split point is not broken.
If splitting is done to an
* XML file
* * The split files will have to be corrected in order to be used, because valid XML requires valid structure and the split files will not both have valid opening and closing tags.
* MarcExchange file or other non-XML files where the file structure is not an issue
* * A record at the split point will be lost if the split files are not corrected, but only a single record at each split point should be broken, not the entire file.
- Article last edited: 10/8/2013