Project Description

"This blog is updated by the JISC funded G3 Project (#jisc3g) team. We are building an framework for teaching and communicating relevant geographic concepts and data to learners from outside the world of geography and GIS. We think this blog will be of particular interest to those working or teaching in HE and FE and those interested in teaching and learning and e-learning."

|Read more about the project |

Tuesday, 22 March 2011

Congratulate me - I'm a data scientist

(And, if you are reading this blog, you might be one too).

David Flanders (at JISC, who fund the G3 project) has asked us to comment on a report about the O’Reilly Strata Big Data conference (the report can be found and to comment how we think that Big Data will impact our project. I was also lucky enough to attend a workshop on linked data last week (hosted by the British Computer Society and the UK Location programme). As I see it, there is a lot of overlap between these topics.

Firstly, what do I understand by ‘Big Data’ and ‘Linked Data’? Well, for me the words ‘Big Data’ refer to large datasets (multi-millions of elements). However, in the Strata report, ‘Big Data’ is defined in a broader context – it is about processing and managing data (especially large data) and gaining a competitive advantage from analyzing this data and combining it in new ways. ‘Linked Data’ also seeks to make new information from connections between individual elements of data – but in perhaps a much more defined way, using Universal Resource Indicators and Triples.

What strikes me about both ‘Big Data’ and ‘Linked Data’ is that they are two perspectives on the same problem – there is an amazing amount of information out there, and most of it is not stored in standard relational database format (tables, keys, constraints). The unstructured nature of the data makes it far more difficult to extract useful information, make links between datasets and generate new thinking from this data.

However, do ‘big data’ and ‘linked data’ really refer to new problems? Perhaps in terms of the sheer variety of datasets that are now available, and the format it is stored in. On the other hand, we’ve been trying to share and combine data for a very long time to generate new information. Indeed, one of the strengths of a GIS is the ability to create spatial joins on data - linking two items of data that occur at the same location – rather than traditional relational joins. More datasets and different ways of storing and publishing information just make this problem more complicated.

Will these issues impact our project? Not in the short term, as we are providing pre-packaged datasets to be used in pre-created scenarios (so the integration is done already). In general, however, data issues are certainly worth thinking about as GIS professionals.

Oh, and why are we now data scientists? – because, according to the Strata report, we “manage and explore large datasets, and perform up-to-real time analysis”, perform “science-like activities – statistical analysis, complex computational problems” and are “employees, searching for ways to add value to products.”

No comments:

Post a Comment