Open Data Mashup
The data for our opendata mashup came from a number of different sources (expect a blog post on this very soon!) and we created a SQL Server 2012 (Denali) Tabular Model in Analysis Services to bring together a number of external opendata sets relating to higher education institutions. The main source of data about each HE institution was obtained by extracting data from the Department for Education Edubase system, which contains data on educational establishments across England and Wales. Initially, we wanted our app to query the Google Image Search API dynamically to return a corresponding image, however this proved too slow to realistically use. Instead, as our app will directly query the tabular model I decided to pull in the images associated with each institution directly into the tabular model so it is available for the app to query.Both PowerPivot and the Tabular Model designer in BIDS allow you to import data from an OData (Open Data Protocol) feed to a model. Odata is a web protocol for querying and updating data based on Atom syndication and RESTful HTTP-based services. You can find out more information over at odata.org. So, to bring the image search results into the tabular model, I built a simple data service using WCF Data Services which serves up the image URLs as an OData feed which can then be consumed in the tabular model. The data service uses the WCF Data Service Reflection Provider to define the data model for the service based on a set of custom classes.
One issue is that the image search needs to be based on a known list of institutions (which are already in our model), so the service needed to query the model first to get the list of institutions which could then be used as the basis of our query. This does have the advantage that the image URLs will be updated dynamically if the institution data in the model changes, however it is dependent on the feed data being processed after the institution data.
OData Service
To create the OData service I created a simple .NET class called WebImageSearch which can be used to call the Google Image Search REST Service and extract the first image matching the search results:This class provides two methods, ImageSearch (which is used to call the Google Image Search REST service and carry out the search):
and ProcessGoogleSearchResults to process the JSON returned from the service call. Note that this uses an open source library called Jayrock which allows you to easily process JSON using DataReaders.
The ImageSearch method returns a list of urls as a collection of objects of type Image. This class is decorated with a number of attributes which tell the WCF Data Service Reflection Provider how to interpret the class as a data feed entity. In this case a DataServiceKeyAttribute defines the attribute which is the key for the entity in the resulting data feed. EntityPropertyMapping attributes (whilst totally optional) define a mapping from class properties to entity properties. I found that unless these were specified no data is returned in the feed when displayed in the browser.
A helper class (in this case called InstitutionImages) does the business with reading the existing institution names from the tabular model, passing these to the ImageSearch method in the WebImageSearch class and builds a list of type List<Image> containing the returned image URLs and image IDs (so that we can relate them back to the institutions in the model). We also need to provide a property on this class which returns an IQueryable interface which are the entity sets of Image entity types in the data feed:
Finally to complete the picture, we need to add a new WCF Data Service to the project. In the data service definition we define it as a DataService of type InstitutionImages (which is the name of the class containing our IQueryable property) and also need to define Entity Set access rules:
Importing the data
Now we are ready to test the service! To return data from the OData service, we can simply define the entity set we are interested in in the URL, for example:http://localhost:12345/googleimagesodata.svc/ImagesIf we browse to this URL in the browser this will return the data feed as an Atom syndication. To add the feed to our tabular model in SQL Server Denali BIDS Tabular Model Designer, Click the Import from Data Source button and select “Other Feeds” and specify the URL to the OData service:
The data feed will then be imported into the tabular model:
Thoughts on other approaches
This is not an ideal solution by any means, but it does give a way to import more dynamic data which originates from APIs and services which don’t provide a specific OData feed into a tabular model. This approach would be fine if the data being imported was not dependent on the data in another table (in this case the URLs to return are dependent on each institituion), however where there is a relationship, it becomes a bit messy as the service needs to query the model, creating a circular dependency.A better approach would be to create a user defined function which can then be used to dynamically populate data in a new column based on existing data in the table (sort of like user defined functions in multidimensional modelling). Unfortunately this is not possible with a tabular model in SQL Server Denali. Perhaps this is something that the SQL Azure Data Explorer project may enable us to deal with more elegantly in the future by combining these sources prior to loading into the tabular model?
This has started me thinking of how OData support in SQL Server Denali could be enhanced in order to fully take advantage of the capabilities of OData, but that is the subject of another post I think!
No comments:
Post a Comment