
The launch event for the Data Science Accelerator Programme was an exciting day for our team at The Data Lab. The Data Science Accelerator Programme is a capability-building programme which gives analysts from across the public sector the opportunity to develop their data science skills and for their organisation to build their data science capability. The new Accelerator Programme was inspired by the UK Government Data Science Accelerator that has been running successfully for over two years and sees The Scottish Government partnering with the NHS in a great example of collaborative innovation.
Roger Halliday from the Scottish Government attended the launch, and spoke about the importance of the programme and the commitment from the participating organisations to invest in developing the data skills of their current workforce. We also welcomed the selected participants who will be going through the programme, from the Scottish Government, the Information Services Division, the National Records of Scotland and Registers of Scotland plus sponsors from their organisations.
Our own data scientists Caterina Constantinescu and Richard Carter, who will be providing mentoring throughout the programme, got a chance to chat over the projects and get to know the participants before starting work with them.
Leidos provided secure laptops for handling the sensitive data, handed over with ceremony at the launch event. The laptops provided by Leidos will allow the participants access to software and programmes that would normally not be accessible on their work laptops.
We also heard from Matthew Dray via video call in London – a participant in the UK Government Accelerator Programme. He shared his experience of the programme with us and explained how it helped him in his career, which we all found very inspiring.
About The Accelerator Programme
The programme is a Scottish Government collaborative project, which is open to employees of the Scottish Government, the Information Services Division, the National Records of Scotland and Registers of Scotland, who are in the analytical professions of statistics, economics, operational research and social research, and other staff who have a high level of analytical skills.
Participants are expected to identify a project relevant to the work of their area that would help tackle a specific problem their business area has identified. The solution should be expected to bring a business benefit to the organisation. Participants are expected to have a high value project idea, enthusiasm to try new things and dedication to the course.
We ask for a commitment of at least 1 day per week for 3 months, and we encourage participants to spend that day working on the project and speaking to their mentor to get support. Having this protected time is a key benefit of the programme. Mentors will be existing data scientists and will work either at The Data Lab or participating organisations.
At the end, participants will present their work to the other participants involved in different projects in the programme. They will also be expected to share their learnings within their Team/Division so that there are wider benefits from the research and, where possible, at conferences and events organised within their area of profession.
Our pilot Accelerator Programme
The programme will initially be restricted to the four participating organisations but, after it is piloted, we will look to make it more widely available to organisations in Scotland that produce official statistics. If you are interested in hearing more about the accelerator programme contact Victoria.Clark@thedatalab.com
The four projects that have been chosen for the pilot programme are as follows:
Using Satellite Information to Improve Agriculture Statistics and Farm Data (SG: RESAS Agriculture Statistics)
To determine the feasibility of whether a crop map for Scotland can be developed using Sentinel 1 and Sentinel 2 satellite data from the EU Copernicus programme as well as developing an understanding of how machine learning and geo-spatial analysis could be used to develop other projects within the analytical unit and wider areas. Currently data is collected by asking farmers to fill out surveys this work could lead to RESAS not having to collect the data from farmers and thus reducing the burden of paperwork that is sent to farmers.
The project will use machine learning and geospatial analysis.
Matching messy text to Standard Occupation Classifications (National Records of Scotland)
To develop new autocoding methods and assign Standard Occupation Classification codes to text responses to questions on job titles and descriptions in the Census.
In the last Census, assigning classification codes to text responses required 200 manual workers for 100 days, delayed data processing and had a considerable knock-on effect producing outputs. More than a third of manual coding effort was spent on one variable: Occupation.
This will vastly reduce data processing timescales and cost, allowing NRS to meet the tight deadlines set for publishing outputs within a year of Census Day. Data quality will also be improved by eliminating inconsistencies associated with having multiple manual coders.
NRS plan to produce new methods for automatically matching text to the Standard Occupation Classification coding index using: text matching; string processing and machine learning.
Detecting buildings in historic maps (Registers of Scotland)
To automate the extraction of information from historical images, dating back to 1843, using computer vision and/or entry level machine learning techniques which will provide valuable information about the Scottish built environment.
This project forms part of an ambitious, long-term goal to develop a House Age dataset for Scotland, which would use the historic map surveying dates to assign an age (interval) estimate to individual, extant buildings.
This will help open up new avenues of GIS and historic research. A completed House Age dataset would not only enrich the Registers of Scotland house sales statistics, but could potentially be of value to researchers working on historical research, house condition surveys, fuel poverty maps, deprivation indices and heritage conservation.
This proof-of-concept work would utilise novel data-science approaches that could then potentially be developed into a fuller project to answer questions of wider interest around property ages, distributions, and changes over time.
Burden of disease in Scotland for over 100 conditions and injuries (NHS: Information Services Division)
To identify the most suitable visualizations from the study that gets undertaken on the estimates on the burden of disease in Scotland for over 100 conditions and injuries and then to present these results in an engaging way and enhance the user experience.
These estimates provide policy makers and other partners working in health planning and resource allocation, an opportunity to see the big picture, to compare the relative importance of different diseases, injuries, and risk factors, and to understand in a given place, time, and agesex group and local geography, what are the most important contributors to health loss.
To date only a small portion of the results get published and there are many more statistical outputs to be released. The project will use data visualization and user experience principles to produce outputs that are engaging, accurate and communicate key messages effectively.
Encouraging collaborative innovation
We are delighted to be encouraging collaborative innovation in the data science community and are looking forward to seeing the real economic benefits that these valuable data science projects will bring to their organisations.
If you are interested in hearing more about the accelerator programme contact Victoria.Clark@thedatalab.com