• Skip to primary navigation
  • Skip to main content
The Data Lab

The Data Lab

Pruple button with the word menu
  • Business Support
        • Business Support

          We’ll help you harness the power of data so you can innovate and grow your business.

          Visit our Business Support page

        • Accessing Talent
          • Data Talent
          • Placements
        • Funding
        • Small Business Support
        • Digital Strategy
        • Academic Project Funding
        • The Data Lab Community
  • Professional Development
        • Professional Development

          We’ll help you harness the power of data so you can innovate at work and also advance your career.

          Visit our Professional Development page

        • Workshops
        • Online Courses
        • Data Skills for Work Programme
        • The Data Lab Community
  • Students
        • Students

          We’ll help you learn about the power of data and gain real-world experience and career-focused qualifications.

          Visit our Students page

        • The Data Lab Academy
        • PhD
        • TDL Academy Placements
        • Scholarships
        • The Data Lab Community
  • Partner With Us
        • Partner With Us

          We work in partnership with companies to help them gain maximum benefit from the strategic use of data.

          Visit our Partner With Us page

        • Collaborate With Specialists
        • Partnerships
  • About Us
        • About Us

          We discover opportunities, connect people and ideas, develop knowledge and expertise and bring game-changing data projects to fruition.

          About Us

        • Our Team
        • Careers With Us
        • Academic Opportunities
        • The Data Lab Community
        • Case Studies
        • News & Podcasts
        • DataFest
        • Scottish AI Alliance
        • Contact us

Do you know what makes a Data Scientist? No worries, AdzunaDataBot is at your service

News 23/04/2016

Who is a Data Scientist? Are they statisticians who have a solid understanding of data and modeling, and love commenting on the shape of bell curves? Are they scientists, who can design experiments and test hypothesis? Are they programmers who eat code for lunch and can process vast amounts of data in their sleep? Or are they business analysts who understand what the relevant questions to ask are when looking at a data set?

The Data Science job market is still fairly nascent and the question on what makes a Data Scientist has not been answered conclusively. This was when we conceived the AdzunaDataBot! The project was initiated with a two-fold goal of building an understanding of the Data Science job market in the UK and monitoring it on a continuous basis, as well as building expertise in product development on one of the major cloud solutions provider by doing a pilot project. AdzunaDataBot gathers jobs data from Adzuna, a UK based job boards aggregation website, stores and processes them on a cloud platform, and presents them visually in an easily interpretable format for interested users. Adzuna’s data store can be accessed through the Adzuna’s web API, which can be queried by keywords, and provides a rich variety of information regarding each job ad posted on the different job boards adzuna queries.

While it may be true that a ‘complete’ data scientist would (should?) have all the skills mentioned above, few among us can claim to have reached that pinnacle. However, all aspiring Data Scientists begin at one of those careers in one area of speciality and build their skills in the other areas as they progress in their career. But how do we prioritize these skills by the order in which they are valued in the job market today? We, at The Data Lab, looked to take a Data Science approach to this Data Science problem by analysing at actual jobs data to see what the market says, and voilà, AdzunaDataBot!

Not only will this data from AdzunaDataBot be useful to individuals who want to make smarter career choices, it will be very useful for program coordinators at universities, skills academies and bootcamps, to correctly identify the different kinds of data science positions, and tailor each of their programs to better provide the required skills to their students. And this goes to the heart of the core mission of The Data Lab, which is to drive collaboration betweeen Scottish industry, public sector and academia, to exploit the value of Data Science together. Training people up with the right skill sets is the first step in ensuring Scottish industry is in the best position to be able to exploit the techniques of Data Science effectively.

A public preview of the AdzunaDataBot is available here.

An API a day, and with a cloud solution to play, makes an easy data product today

APIs, or Application Programming Interfaces to give them their full credentials, are becoming increasingly common on the web, with all kinds of services wanting to build an ecosystem around their product by enticing developers with the ‘cool-factor’ of their API. Web APIs offer a well-defined way to programmatically access the underlying data which power many of the services we use on the web today. The drive towards a more open data culture has further pushed the drive towards open APIs. While Facebook and Twitter might be the first ones to come to mind, they are by no means the only ones to offer access to their data mines. All kinds of services offer API access to their data including flight pricing engines, job boards, hotel booking services and auction websites, among many others. Developers from yesteryears might nostalgically look back on their days scraping data from HTML pages, but the Data Scientists are not complaining! The ability to access clean datasets from the APIs now allows us to spend more time building the data product, which after all is the more interesting/ potentially lucrative bit.

Cloudy days ahead

So with this in mind, we started this project with the aim of collecting data from the Adzuna web API, storing it, and building a data product around it. And we decided to build this solution on the cloud to guarantee consistency and interoperability between platforms. A few different cloud solutions were evaluated including PaaS solutions like IBM BlueMix and Heroku and IaaS solution, AWS EC2. While each platform has its advantages, we chose AWS as our cloud solution as we wanted to start with a simple solution without too many fancy services attached to it. AWS EC2 cloud is simple to configure and get started, and it has great documentation to get unfamiliar developers up to speed quickly. The AWS free tier, which is available to anyone for a 12-month introductory period, was sufficient for this task.

Implementation

The AdzunaDataBot was implemented completely in R. The infrastructure components from AWS included a free tier EC2 linux box and a MySQL database to store the data. To configure the EC2 environment for running R, we followed this very easy to blog post by Amazon. The API call returns a JSON object which can be easily read into an R dataframe. Since making a call to the API and returning a dataframe object is a core functionality which can be leveraged across many different applications, it was implemented into an R package, called “adzunar”, which has been released separately to Github. This allowed us to experiment with the search terms and the results and abstract away the details of actually making the API calls. We setup a job to query the Adzuna API on a daily basis, with the keywords “data science”, and store the results in the MySQL DB. The free tier of the MySQL DB on Amazon RDS comes with 750hrs of usage and 20GB of storage. This is more more than sufficient for the prototype which we built. This data was then queried on a daily basis to render an HTML page using flexdashboard, a super cool publishing tool available for R. Flexdashboard gives the ability for non-web developers (like us) to simply render R plots (like ggplot2) onto a beautiful HTML page with just a few lines of code!

The complete code for this implementation is available on Github for any interested Data scientists out there. This project is still a work in progress, so contributions are most welcome!

Nuggets from AdzunaDataBot “ Programmers are in demand

The development version of the app has already yielded some pretty useful information regarding the Data Science job market in the UK.

We know that:

  • The top five skills mentioned are Python, statistics, java, hadoop and spark. Programmers/ Data Engineers are clearly in demand
  • London forms the most significant hub for Data Science jobs in the UK
  • The median salary on offer is £49k per annum
  • There is a large variance in the salaries on offer, starting from £20k all the way to £200k
  • The most popular buzz word in use among all the job adverts is “analytics”

For future work, we can look to split the analysis by experience level to identify the skills required for entry-level data scientists vs those required for experienced hires.

Innovate • Support • Grow • Respect

Get in touch

t: +44 (0) 131 651 4905

info@thedatalab.com

Follow us on social

  • Twitter
  • YouTube
  • Instagram
  • LinkedIn
  • TikTok

The Data Lab is part of the University of Edinburgh, a charitable body registered in Scotland with registration number SC005336.

  • Website Accessibility
  • Privacy Policy
  • Terms & Conditions

© 2023 The Data Lab

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsReject AllAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearSet by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent1 yearRecords the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_DPXX4XJSJ82 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_54851888_11 minuteSet by Google to distinguish users.
_gat_UA-54851888-11 minuteA variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au3 monthsProvided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
personalization_id2 yearsTwitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
CookieDurationDescription
cl-bypass-cache1 hourNo description
muc_ads2 yearsNo description
SAVE & ACCEPT
Powered by CookieYes Logo