Splitting files with dd

We have an ESXi box hosted with Rackspace, it took a bit of pushing to get them to install ESXi it in the first place as they tried to get us to use their cloud offering. But this is a staging environment and we want something dedicated on hardware we control so we can get an idea of performance without other people’s workloads muddying the water.

Anyway, I’ve been having a bit of fun getting our server template uploaded to it, which is only 11GB compressed – not exactly large, but apparently large enough to be inconvenient.

In my experience the datastore upload tool in the vSphere client frequently fails on large files. In this case I was getting the “Failed to log into NFC server” error, which is probably due to a requisite port not being open. I didn’t like that tool anyway, move on.

The trusty-but-slow scp method was also failing however. Uploads would start but consistently stall at about the 1GB mark. Not sure if it’s a buffer or something getting filled in dropbear (which is designed to be a lightweight ssh server and really shouldn’t need to deal with files this large), but Googling didn’t turn up much.

So I went down the track of splitting up the file into smaller chunks, easily done using the split tool. Except I didn’t know about the split tool and used dd.

So if you’re reading this you probably want to use split, but if for some reason you need to do this on an environment that doesn’t have split and does have dd (like ESXi), this could help:

[shell]
#!/bin/bash

FILE=”$1″

#How big we want the chunks to be in bytes
CHUNKSIZE=$(( 512 * 1024 * 1024 ))

#Block size for dd in bytes
BS=$(( 8 * 1024 ))

#Convert CHUNKSIZE to blocks
CHUNKSIZE=$(( $CHUNKSIZE / $BS ))

# Skip value for dd, we start at 0
SKIP=0

#Calculate total size of file in blocks
FSIZE=`stat -c%s “$1″`
SIZE=$(( $FSIZE / $BS ))

#Loop counter for file name
i=0

echo “Using chunks of “$CHUNKSIZE” blocks”
echo “Size is “$FSIZE” bytes = “$SIZE” blocks”

while [ $SKIP -le $SIZE ]
do
NEWFILE=$(printf “$FILE.part%03d” $i)
i=$(( $i + 1 ))

echo “Creating file “$NEWFILE” starting after block “$SKIP””
dd if=”$FILE” of=”$NEWFILE” bs=”$BS” count=”$CHUNKSIZE” skip=$SKIP

SKIP=$(( $SKIP + $CHUNKSIZE ))
done
[/shell]

Afterwards:

scp ./*.part* user@host:/vmfs/datastore/

Then at the other end you simply concatenate them together. I generated a list of files with `ls -tr1 *.part*` and simply pasted that into a script. Obviously the order is critical, but reverse sorting by time (which is what the r and t options do) gives the correct order.

[shell]
#!/bin/bash

#FLIST=`ls -tr1 *.part*`
FLIST=”devbox.tgz.part0 devbox.tgz.part1 devbox.tgz.part2 devbox.tgz.part3 devbox.tgz.part4 devbox.tgz.part5 devbox.tgz.part6 devbox.tgz.part7 devbox.tgz.part8 devbox.tgz.part9 devbox.tgz.part10 devbox.tgz.part11 devbox.tgz.part12 devbox.tgz.part13 devbox.tgz.part14 devbox.tgz.part15 devbox.tgz.part16 devbox.tgz.part17 devbox.tgz.part18 devbox.tgz.part19 devbox.tgz.part20 devbox.tgz.part21″

OUTPUT=”output.tgz”

for F in $FLIST
do
cat $F >> $OUTPUT
done
[/shell]

13 thoughts on “Splitting files with dd

  1. Greg Larkin

    Excellent, this saved me having to put a script together myself. I have been backing up huge vmdk files to Rackspace CloudFiles, and they have a 5GB per object limit.

    Cheers,
    Greg

    Reply
  2. anonymoose

    Cheers for the tip. This came up on a search and was useful for recovering some simulations that I had accidentally appended to, rather than overwritten

    Reply
  3. fuchur

    that script really helped me – thanks!

    Just a little improvement for the split:
    NEWFILE=$(printf “$FILE.part%03d” $i)

    Reply
    1. Chad Jay Harrington

      Here’s what I did to resolve my concerns… Also, in your post your files indicate that you ran another program as well (tgz)….

      I also had problems with the printf of the FILE name putting quotes in the filenames at the beginning and end of every part…

      I am trying to get an image of a 240GB SS HD and a 1 TB SS HD, split into pieces so I can burn the pieces onto ~25GB Bluray discs so i can do the Autopsy later when I have more money.

      Here’s my code variant:

      #!/bin/bash
      FILE=$1
      NEWFILE_STUB=$2

      #How big we want the chunks to be in bytes
      CHUNKSIZE=$(( 4096 * 1024 * 1024 ))
      #Block size for dd in bytes
      BS=$(( 8 * 1024 ))
      #Convert CHUNKSIZE to blocks
      CHUNKSIZE=$(( $CHUNKSIZE / $BS ))
      # Skip value for dd, we start at 0
      SKIP=0
      #Calculate total size of file in blocks
      FSIZE=`stat -c%s “$1″`
      SIZE=$(( $FSIZE / $BS ))
      #Loop counter for file name
      i=0
      echo “Using chunks of “$CHUNKSIZE” blocks”
      echo “Size is “$FSIZE” bytes = “$SIZE” blocks”
      while [ $SKIP -le $SIZE ]
      do
      NEWFILE=$(printf “$NEWFILE_STUB.part%03d” $i)
      NEWFILE=basename $NEWFILE
      i=$(( $i + 1 ))
      echo “Creating file “$NEWFILE” starting after block “$SKIP””
      dd if=$FILE of=$NEWFILE bs=”$BS” count=”$CHUNKSIZE” skip=$SKIP
      SKIP=$(( $SKIP + $CHUNKSIZE ))
      done

      Reply
  4. Chad Jay Harrington

    In case of premature errors where all the pieces are not generated, perhaps we could provide an option that identifies that some of the parts are there already, and if we would like to regenerate them all or just the missing parts. and that way we don’t waste any time if not necessary

    Reply
  5. Christian Rakotondratsima

    Hello Alex,
    We used it a lot, thank you.
    Here is our variant, we faced some issue sometimes with the previous version.
    I added also the merging, or concatenation.
    It works for both, splitting and merging, if one uses the same name for the 2 parameters
    – the splitting
    ===========================
    #!/bin/bash
    #From Al4, Alex https://blog.al4.co.nz/2011/03/esxi-and-splitting-a-file-with-dd/
    #Improved by Chad Jay Harrington
    #Improved by CRA Miro K.E.
    FILE=$1
    NEWFILE_STUB=$2
    #How big we want the chunks to be in bytes, here is for 256 mb
    CHUNKSIZE=$(( 256 * 1024 * 1024 ))
    #Block size for dd in bytes
    BS=$(( 8 * 1024 ))
    echo “block size: “$BS
    #Convert CHUNKSIZE to blocks
    CHUNKSIZE=$(( $CHUNKSIZE / $BS ))
    echo “chunksize: “$CHUNKSIZE
    # Skip value for dd, we start at 0
    SKIP=0
    #Calculate total size of file in blocks
    FSIZE=$(stat -c%s “$1”)
    echo “file size: “$FSIZE
    SIZE=$(( $FSIZE / $BS ))
    #Loop counter for file name
    i=0
    echo “Using chunks of “$CHUNKSIZE” blocks”
    echo “Size is “$FSIZE” bytes = “$SIZE” blocks”
    while [ $SKIP -le $SIZE ]
    do
    NEWFILE=$(printf “$NEWFILE_STUB.part%05d” $i)
    NEWFILE=basename $NEWFILE
    i=$(( $i + 1 ))
    echo “Creating file “$NEWFILE” starting after block “$SKIP””
    dd if=$FILE of=$NEWFILE bs=”$BS” count=”$CHUNKSIZE” skip=$SKIP
    SKIP=$(( $SKIP + $CHUNKSIZE ))
    done
    ===========================
    – the merging
    ===========================
    #!/bin/bash
    SPLITTED=$1
    MERGED=$2
    FLIST=$(ls -tr1 $SPLITTED*)
    OUTPUT=$MERGED
    for F in $FLIST
    do
    cat $F >> $OUTPUT
    done
    ===========================
    All the best.
    Christian

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.