Feasting on Data

dafeastA year ago I heard about a little hackathon being held in the middle of a high-powered social good conference called The Feast. Hackers who signed up got full access to the conference and its many perks. It sounded like a good deal to me, so I signed up.

The challenge of The Feast hackathon is building things with data. At the time I was just coming to terms with the gap between the possibilities of data and the reality. Participating helped me articulate everything that bothered me with the current data infrastructure. Instead of doing a traditional hackathon demo, I ended up using my time on the main conference stage giving a mini lesson in why hackathon ideas never become real life solutions (and also, a tutorial on brute force hacking for newbies)

(my part starts at 7:44)

A big target for me in that speech was The Khan Academy‘s API, which gives you full access to all the Khan Academy’s educational resources … with one catch.

There’s no good way to search for content. So in order to get what you need, you have to know exactly what you need and the exactly where it is. Oh sure, there’s a topic tree, but you can’t query it. So in order to get that information you have to download and parse the entire json file … all 30MBs of it.

This year Exversion was approached by The Feast to provide data for the hackathon, so I set about fixing the Khan Academy problem. It took me some time to figure out how best to untangle a tree when each branch was an infinite number of levels deep, but once I did I was able to produce a query-able directory of The Khan Academy’s content. Why should you have to sort through all the economics content when what you want is videos on basic addition? Now you can grab your Exversion API key and do this:

https://exversion.com/api/v1/dataset/3QC5N5HXIGSJK3Z?key=xxx&title=addition

You can even use complex queries to search the videos themselves, Check out our API documentation for more details.

Going geospatial with Exversion

nycstmen

Image by Stamen Maps

Earlier this week we gave an impromptu and quick overview of Exversion at #NYC Beta‘s meetup. The majority of the talk revolved around some of the idiosyncrasies of PLUTO and MapPLUTO, and the audience, a largely geospatial crowd, wanted to know what GIS functionality if any we support.

While all we can say is that geospatial is dear to our hearts, at present all API output is for the time being in JSON. However, if the dataset contains latitude/longitude or x/y coordinates you should be able to use it with popular mapping libraries such as leaflet, and D3.js, as well as Google Maps, Bing Maps, et al., allowing you to map those JSON objects though our API.

An sample dataset that this would work with is one we featured during this years Publishing Hackathon, held during Book Expo America, Banned and Challenged Books.

latlongjsonWhen we run a simple search query on it, or look at the data preview on the dataset’s page, we see that it contains both latitude and longitude columns, along with other information about the challenged title, city, state, challenger, and other details.

The coordinates in the dataset, simply allow us to load a generic JSON layer, and display points on a map, such as in this Publishing Hackathon example by Jackon Lin who used the Banned and Challenged Books dataset in his visualization. *Displayed at the bottom of the page.

While this for the time being isn’t a complete answer to a GIS Data API, it’s a step in the right direction, and as we develop Exversion further, we hope to build in geospatial functionality that will make is easy, simple, and intuitive to import data hosted on the platform to a wide suite of geospatial data visualization tools.

And for the time being, if you build any apps, geo or other on the platform, we would love to see them. So please send your work to info @ exversion.com and we’ll try to feature as many of these as we can.

Now go click on that map and see what books people have tried to ban in the United States.

exampleapp

Every piece of NYC’s real estate data is now accessible through our API

NYC PLUTO EXVERSION

This week we announced that The City of New York Primary Land Use Tax Lot Output (PLUTO) database is now machine readable. Less than a week after the City made the database publicly available, we’ve made all PLUTO data readily queryable and freely available via the Exversion API.

This means that city planners, community boards, researchers and other people seeking commercial and residential real estate data can quickly and easily search hundreds of thousands records.

Normally you’d have to pay the city for this data, clean it, upload it to your own server, now that’s it’s machine readable, anyone with an internet connection can instantaneously start deriving insight from this data, and we’re very excited to see what people do with this data, the types of application they build with it and what they’ll be able to uncover. Continue reading

API Wrapper for Machine-Readable Data

As we crawl and index the world’s data silos, we’ve noticed that some have APIs, to this extent we’ve gone ahead and created an API wrapper for those data sets that are already accessible in a machine readable format.

The benefit of this is that you’re now able to build on top of both NYC data and Chicago data through the Exversion API without having to go to either city’s open data site.

API-Access

While the functionality is there, some problems still persist, primarily that we’re limited to the API functionality of the host website, and henceforth some API functions may not be available.

For example, while you can query a data set housed on Exversion to give you all results that have a single word in a column, the same cannot be said for external data we’ve wrapped from places such as NYC, San Francisco, or Chicago, where you’ll need to query a full text.

Upload data Which is why we urge you to download the data set and upload it to the platform to take full advantage of our API, and you can do this easily, by clicking the Menu box and selecting download and then CSV, and then clicking the Green Upload button. When you do this, the meta, description, and data attribution fields will be automatically populated. Just add a blurb describing the changes (if any), and upload the file.

For the above data set, City of Chicago / Christmas Tree Drop of Locations, the entire process took approximately a minute, making the data readily consumable by anyone on the site, and when you or anyone else searches for the data on Exversion, because it’s now hosted, it will appear at the top of your search.

 

You can now search over 20k data sets on Exversion.

1014389_10151819679099705_11681589_n

At the end of June we added metadata for over 20,000 data.gov data sets on Exversion, and are in the process of adding thousands of other data sets that are housed on CKAN installations throughout the world.

However, the fundamental problem with CKAN meta data aggregation however is that the CKAN API will not let you query against the actual data set, but instead query what types of data sets there are on each installation of the software, making the platform useless in terms of machine readable data.

This model distributes static .csv files along with secondary and tertiary links to off-site data, making it very difficult to aggregate much of anything aside from the link / meta data.  As such, we’re asking you, the crowd, to help populate these data sets. In order to foster this process we’ve provided a simple applet that will allow you to upload a specific data set.

Screen Shot 2013-07-02 at 11.15.04 AM

In the above example, you see that I’ve searched for Crash Statistics by state on Exversion, but the data set is yet to have been imported, namely it needs to be cleaned. Following our style guide, I quickly made the dataset “Exversion ready” i.e. a CSV file, and uploaded to the site with a description of the changes, making the dataset is then available here.

While this is not the perfect solution to the larger problem of not having easily accessible machine readable data, it allows the data community to come together and help make data that has been previously inaccessible, machine readable.

At the same time, while this is the status quo for data housed / linked to on CKAN installations, we’re working on a few projects that should wholly integrate data housed on other platforms.

If you guys have any questions ask them in the comments of feel free to write us at info @ exversion.com