Hi, did ya miss me? I’m blogging from O’Hare today, it’s going to be a busy week for me with lots of air travel. That won’t stop me from polluting the Internet with yet another blog post though, don’t you worry!
One of the things that we sometimes encounter is a binary file created by some C program writing a
struct straight to disk. If we want to get any of that data we will need to parse this file, and read that struct. It’s a little tricky, and you kinda have to know a little bit about either the system on which he program was run, or the intentional decisions made by the program about how the file was written. The most important thing you need to know is the endianness of the binary data.
I’m making two assumptions in the code that follows:
- You’re on an Intel chip
- You’re programming the program in 64-bit (which means 8-byte alignment)
Here’s the Code
There are three files you’re going to get here:
- test_t.h – The header file with the C structure in it
- test_t.c – The C program which writes the data file
- test_t_experiment.pl – The Perl program which reads the file
Please note that there is one dependency for the Perl script, that is
Convert::Binary::C, which you can find on CPAN.
So the first thing we have to do is generate the file. We do that with a super simple C program. Then we read the file. Here’s what we do in the experiment:
- Create our instance of
Convert::Binary::C, and tell it which header file to use.
- Tell the instance of the convert module that the
.twostruct member is a C string
- Print the size of the structure (which is our record length)
- Print out a quick idea of what the struct looks like to the convert module
- Open and read the file
- For each record, print out what the data looks like, interpreted by the convert module into the struct we got in as a command line argument
- We make a special point to print the
.twomember of the struct so we demo that it is indeed being properly read as a string into Perl
We all have those little utilities that we write at home to scratch an itch we had earlier at work, and this is one (of many) which I have done in the last few years. This code is a bit less refined than what I normally post, I know, but I am sitting at an airport right now and I did promise a Gist per Day when I could… and I managed to pull it off today.
I haven’t really tested the performance of this all that thoroughly, but I have used the utility I wrote for files up to 200K records in testing without much more of a delay than a bathroom break. That was just the reading of the file and unpacking from the struct per record though.
Please be nice to my rushed code sample, I know it’s kinda awful, but it does work (I just tested it).