
It’s just over a month since we celebrated the success of the Scottish Government Accelerator Programme at our offices in the Bayes Centre. The event follows on from the launch in May and had the 4 participants present back their projects and discuss their experience with their sponsors and mentors.
The Data Science Accelerator Programme gives analysts from across the public sector the opportunity to develop their data science skills and for their organisation to build their data science capability. This pilot version of the programme has involved the Scottish Government partnering with the NHS NSS, National Records of Scotland and Registers of Scotland, in a great example of collaborative innovation. The Data Lab helped select the projects from the many applications submitted, and matched the participants based on their area of expertise with mentors; using both our data scientists at The Data Lab and those within our network.
The closing celebration kicked off with a networking lunch, followed by an introduction from Chief Statistician and Data Officer of the Scottish Government, Roger Halliday. Roger praised the participants’ efforts over the last few months and confirmed the Scottish Government’s commitment to continue with the programme. He also emphasised the importance of building on the momentum of this programme and encouraged the participants to act as advocates for developing data science capabilities in their organisations.
Below is a summary of the projects and their outcomes written by each of the participants:
Using Satellite Information to Improve Agriculture Statistics and Farm Data – Karren Friel, Scottish Government
I was investigating how we would go about developing a crop map of Scotland using Earth observation data from satellites. A crop map of England had already been developed and I was looking at how we could replicate the methodology which uses spatial data analysis and machine learning.
During the accelerator programme, I found out a lot more about the satellite data, the analysis and the machine learning and gained an understanding of what we would need in terms of IT, skills and software for us to develop the map. Having the non-networked laptop was great for researching and trying out new software. I settled on using R for manipulating and analysing the large satellite data files. The machine learning can also be done in an R package. I was able to develop the spatial data analysis on a small scale on the laptop to be scaled up once we have a machine of the right spec in place for producing the full map.
I really enjoyed taking part in the accelerator programme. It was really good to have protected time to develop something new and it was great to have the freedom to install different software and just try it out. The catch-up meetings with other participants and programme leaders were helpful for sharing experiences and staying motivated.
Burden of disease in Scotland for over 100 conditions and injuries – Maite Thrower, NHS NSS
I wanted to develop an interactive data visualisation for the Scottish Burden of diseases to present our results and design modern visualization/s with the aim to reach a higher audience and increase the number of users that refer to our statistics and graphics in their reports and websites. Starting with the design of the visualization, I managed to produce a shiny app which allows different users to engage with the data in different levels through different outputs.
Originally, I was going to produce a static tree map and one visualization but I managed, with the direction of the mentor from The Data Lab, to move to a higher level and produce the shiny app. Public Health Scotland has agreed to fund further development of the interactive visualisation and to publish it in the ScotPHO website. I have also been presenting the results of the program to other teams in ISD.
The DA program has been an incredible learning experience. I have had the opportunity to learn and work with a fantastic mentor who has been able to help me to produce the final output in the most effective and enjoyable way. DA has opened doors for me to new areas of data visualisation in Shiny, connected me with others in the industry and now sharing the knowledge with others within the organisation.
Maite’s mentor, Dr. Caterina Constantinescu (data scientist at The Data Lab), agreed that this was a very rewarding experience:
I really enjoyed working with Maite and seeing her progress over time, from the initial planning stages of the Accelerator programme, to when Maite was able to deliver a comprehensive interactive visualisation for the Scottish Burden of Diseases data, using R Shiny. It was a great opportunity for me also to further develop my mentorship skills, and find flexible ways to advise and support Maite in producing her own high quality solutions, which greatly enhanced previous static visualisations.
Detecting buildings in historic maps using Open Computer Vision – Marguerite Le Riche, Registers of Scotland
Currently, there are no vector datasets of historical buildings in Scotland. However, the OS County Series Maps from the mid-19th Century onwards contain a potentially rich source of temporal information about the Scottish built environment. If this information can be extracted and turned into geospatial data using techniques like computer vision and machine learning, new avenues of GIS and historic research will be opened up. Such a dataset would be of value to analysts in a range of fields including environmental and social researchers, heritage conservationists and others.
Producing the dataset is an ambitious undertaking and there are many technical challenges to working with such old and variable input data. I’m used to working with maps and spatial data, but the Open Computer Vision (OpenCV) python library and entry-level machine learning algorithms that I used were all completely new to me.
Through the DSAP much of the previous work in this area (including my own work and the work of my project mentor James Crone from EDINA) has been brought together, joined up and significantly progressed.
I learned a lot in a short time, and without the support of my employer (Registers of Scotland), perfect project/mentor matching by the Data Lab and generous input from my mentor, it would not have happened.
Matching messy text to Standard Occupation Classifications – Elizabeth Herridge, National Records of Scotland
This project explored the potential for using machine learning techniques to classify text responses in the 2011 population census to code one variable: occupation. For the last census this took 200 manual workers 100 days effort. Using a sample of text response data from the last census the participant was able to accurately classify around 10 times that which were automatically assigned previously from text responses. This is an improvement on what we achieved in 2011 and has the potential for developing these methods further to benefit processing for Scotland’s Census 2021.
I really enjoyed the programme and have benefitted hugely from having Roman as a mentor. I think the project has been really successful, and that there is plenty of business potential to develop the machine learning methods trialled in this programme further.
It is clear from the participants, that their experience has been very positive and the enthusiasm in the room was very inspiring. These small projects were carried out over a course of almost 4 months, roughly one day a week for each of the participants. They demonstrate just what can be done with the data that we already hold and highlight the importance of nurturing and developing our current workforce. Small projects can have a big impact!