Obscure Data Formats: .px files

I was answering some Data Requests this morning when I came across a download option on an open data site I was not familiar with:

Download PC-Axis file

PC-Axis…? What the hell is a PC-Axis file?

As it turns out PC-Axis is a statistical program developed by the Swedish government. It is used in Croatia, Denmark, Estonia, Finland, Ireland, Iceland, Latvia, Lithuania, Norway, Greenland, The Republic of Slovakia, Slovenia, Spain and Sweden, Taiwan, The Philippines, Kuwait, Algeria, Mozambique, Namibia, Uganda, South Africa, Tanzania, Bolivia, Brazil and organizations like the UN’s Economic Commission for Europe, East African Community, and FAO (Kyrgyzstan, Ghana, Kenya)

That’s surprisingly prevalent for a small freeware program only available to Windows users.

And since the data I wanted was in this strange .px format and there were no other options, I had to open it up and figure it out.

PX files are basically CSV files with a space as a delimiter and a whole bunch of metadata at the top.

Sweet! Fire up the python scripts~

However the data was all in German and the spec on PX format left much to be desired. So for those who encounter it later on, here’s what you need to know:

  • All the data is– conveniently enough– in the section marked as DATA
  • The header section is labeled HEADING but it may only include a variable reference, which will be defined in detail in one of the VALUE sections of the metadata
  • The section named STUB is actually the equivalent of a y-axis. So you can think of it as extra columns with categories the user might want to filter by. The data I was looking at was broken up by year and by month, so the STUB values were year and month. Like the HEADING the details are defined in a VALUE section of the metadata

Converting PX to CSV is actually pretty simple: replace the spaces with commas, copy and paste the HEADING to the top of the file and delete the metadata. If you want to keep the STUB columns a quick python script that reads the csv as a dictionary, adds the necessary values and writes another CSV file will do the trick.

Otherwise, R can read px files, so can Matlab with some hacking, fans of OpenRefine will be happy to know there’s an extension, and Node.js hackers can parse it with this package.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s