• Skip to primary navigation
  • Skip to main content
The Data Lab

The Data Lab

  • Business Support
        • Business Support

          We’ll help you harness the power of data so you can innovate and grow your business.

          Visit our Business Support page

        • Accessing Talent
          • Data Talent
          • Placements
        • Funding
        • Small Business Support
        • Digital Strategy
        • Academic Project Funding
        • The Data Lab Community
  • Professional Development
        • Professional Development

          We’ll help you harness the power of data so you can innovate at work and also advance your career.

          Visit our Professional Development page

        • Workshops
        • Online Courses
        • Data Skills for Work Programme
        • The Data Lab Community
  • Students
        • Students

          We’ll help you learn about the power of data and gain real-world experience and career-focused qualifications.

          Visit our Students page

        • The Data Lab Academy
        • PhD
        • TDL Academy Placements
        • Scholarships
        • The Data Lab Community
  • Partner With Us
        • Partner With Us

          We work in partnership with companies to help them gain maximum benefit from the strategic use of data.

          Visit our Partner With Us page

        • Collaborate With Specialists
        • Partnerships
  • About Us
        • About Us

          We discover opportunities, connect people and ideas, develop knowledge and expertise and bring game-changing data projects to fruition.

          About Us

        • Our Team
        • Careers With Us
        • Join our board
        • Academic Opportunities
        • The Data Lab Community
        • Case Studies
        • News & Podcasts
        • DataFest
        • Scottish AI Alliance
        • Contact us

Back to Basics: What is Machine Learning?

Tech blog 01/04/2022

Joanna Machine Learning

In our new series, Back to Basics, we aim to simplify the answers to some of the most frequently asked questions around data technologies and data science.

In this blog, The Data Lab’s Principal Data Scientist Joanna McKenzie draws on her experience as an analyst and team leader to tackle some of the most popular Machine Learning questions. Joanna has over 10 years experience in software development, using tools such as MySQL, MATLAB, Python, and R.

What is machine learning?

Joanna: So traditionally, when people interact with computers, they are programming. Programming is essentially trying to get a computer to perform a task by giving it step-by-step instructions. For data, that doesn’t always work so well, it can be quite difficult to do. Sometimes the patterns aren’t very obvious.

So, in machine learning, you’re essentially allowing the computer to learn (what the pattern is) by itself. You allow it to extract the answer depending on what’s contained in the data, without you having to pull that pattern out and code it in personally. It gives you a lot of power to understand data and to make connections between things in a way that you wouldn’t be able to do on your own.

What is machine learning good for?

Joanna: First and foremost, it’s an automation. It takes something which would otherwise be manual, difficult, and time-consuming and makes it much faster. You can also get more accuracy from it because computers are really good at processing information. They’re able to sort through an exceptional volume of connected information, much more than a human brain can really manage at the same time.

Computers can sometimes bring up patterns that humans would miss and that can be a real strength. However, it can also be something to watch out for; computers will pull out patterns that are in the data, but these may not be supported by the real world. For example, if you’re missing information or it’s not provided in the dataset, the computer can’t take it into account. In this instance, it might run off in the wrong direction and see something that doesn’t really work in any sort of broader context.

Say, for instance, I’m doing a simple machine learning prediction of weekly sales figures. I take the data from the last eight or nine weeks, tell it whether it’s a weekday or a weekend, and I feed that into the algorithm. For a while, I check my actual sales against the forecast, and it seems fine, until suddenly I find my forecasts can’t keep up with the real-life sales. That’s when I realise that sales are climbing because Christmas is approaching. The algorithm has no way of accounting for that, it doesn’t know what Christmas is, or how that changes the people’s buying patterns. So the data that isn’t there – data for prior Christmas seasons, in this case – is making my prediction inaccurate.

Do the benefits you mentioned mean that machine learning can be good for avoiding mistakes? Does it remove the risk of human error?

Joanna: Yeah, to some extent. If you’re automating a process, very often, that’s a real strength. When you’re asking people to do something manually over and over again, they get tired, they make mistakes. Any system that can automate it will avoid that.

However, it’s worth noting that, when you’re programming something yourself, if the computer starts to make mistakes, you can go back and manually make a change so that mistake won’t happen anymore. You end up with a much stronger process which just gets better over time. Whereas, in a machine learning approach, you don’t have the same level of control over the exact action that the algorithm will take. As a result, it can still make mistakes and sometimes it can still be inaccurate, meaning you have to be more careful.

What is the difference between machine learning and AI?

Joanna: So firstly, I don’t really consider AI a technical term. Speaking as a Data Scientist, I wouldn’t usually talk about AI generally, and the reason for that is that it’s a very evocative term. What it evokes in people’s minds can be very broad. Many people still associate AI with the technology found in movies such as 2001: A Space Odyssey; something that speaks back to you, which can process information in lots of different contexts. At this stage, that’s not really something we have in the real world. It doesn’t really exist. It’s not something that I could, as a Data Scientist, just go away and build.

What AI tends to be used for, in a more technical sense, is the more advanced types of artificial intelligence of machine learning; using multiple sorts of machine learning algorithms to work together to do different things, to build a much more capable system. Things like Siri and Alexa (voice activated assistants) are really good example of this. They break down human conversation using one system, and perhaps respond to a command using a different system. That’s closer to the sorts of thing that AI might look like in this day to day world.

Machine learning is just a black box algorithm. It’s a way of pulling a pattern out of data. So it’s much more specific and easier to define in a technical sense.

Can you give us any machine learning examples?

Joanna: Absolutely. One of the most common things that people will be very familiar with is recommendation engines. If you’re browsing on Netflix, for example, it will keep an eye on the kind of things that you’re watching. And, from this, it will offer up other content that it thinks you will enjoy. Essentially, it’s taking the data of things that you’re watching, comparing it to lots of things that others are watching, and then using that to rank content.

Amazon has the same thing. I love their platform as an example because, if you’ve ever clicked on something that’s not particularly popular on Amazon and then looked at the recommendations underneath, sometimes they’re really wild. You get some really odd recommendations because only 3 or so people have bought it in the last year. And, those 3 people have then moved on to buy something totally unrelated, such as a great uncle’s birthday gift. Because the system has such small amounts of data, there’s not enough to do something sensible. As a result, whilst buying a book about rabbits, you might end up with a recommendation for something completely unrelated, like a plant pot.

What is the best programming language for machine learning?

Joanna: Machine learning exists in a lot of programming languages now. The most popular data science languages at the moment are R and Python, but there are obviously other opportunities out there. Both of them have libraries full of ways of doing machine learning and they both have strengths and weaknesses. Categorical data is a stand out difference between R and Python.

Python isn’t great for categorical data, you have to translate all your categories into numbers. So, if you have 4 categories: Excellent, mostly good, mostly bad, terrible, you’d need to code them, “1, 2, 3, 4,” for example. Meanwhile, R will accept and work with categorical data quite happily.

Programmes all have different strengths and weaknesses. Usually, people use a programming language they’re most familiar with. They don’t pick up a new programming language every time they have a new machine learning task to do.

Would you say it’s best to go with the programming language you feel most comfortable with?

Joanna: Definitely, it’s important to work to your own capabilities and have the ability to check as you go. With machine learning languages, your work depends very much on context, what’s acceptable and what’s not.

COVID tests are a really good example of this. If a lateral flow test says you’re positive, that positive result is almost certainly correct. But, if they say you’re negative, it might not be correct. They don’t verify that you don’t have COVID. Somebody out there has verified that this device is acceptable, even knowing this, and it’s the same with machine learning. When they give a prediction, it won’t always be accurate, but you need to look at when it IS accurate (and then decide whether that’s acceptable when it comes to putting it into practice).

A lot of what a Data Scientist brings to the whole machine learning process (given that the computers are doing some of the hard lifting for you), is the checks and context of specific information. In regards to programming languages, that’s where being in your comfort zone is extremely valuable: you’ve built up lots of tools and techniques, and those capabilities make sure you get the best out of the data.

Joanna and her team of Data Lab Data Scientists regularly support organisations with new digital projects.  

Our Internal Data Science projects include up to 20 days of Data Scientist time, and are available to organisations in any sector with a presence in Scotland. They will work with you to scope your proposal and provide project support.

Tags: machine learning

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Innovate • Support • Grow • Respect

Get in touch

t: +44 (0) 131 651 4905

info@thedatalab.com

Follow us on social

  • Twitter
  • YouTube
  • Instagram
  • LinkedIn

The Data Lab is part of the University of Edinburgh, a charitable body registered in Scotland with registration number SC005336.

  • Website Accessibility
  • Privacy Policy
  • Terms & Conditions

© 2023 The Data Lab

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsReject AllAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearSet by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent1 yearRecords the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_DPXX4XJSJ82 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_54851888_11 minuteSet by Google to distinguish users.
_gat_UA-54851888-11 minuteA variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au3 monthsProvided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
personalization_id2 yearsTwitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
CookieDurationDescription
cl-bypass-cache1 hourNo description
muc_ads2 yearsNo description
SAVE & ACCEPT
Powered by CookieYes Logo