• Skip to primary navigation
  • Skip to main content
The Data Lab

The Data Lab

Pruple button with the word menu
  • Business Support
        • Business Support

          We’ll help you harness the power of data so you can innovate and grow your business.

          Visit our Business Support page

        • Accessing Talent
          • Data Talent
          • Placements
        • Funding
        • Small Business Support
        • Digital Strategy
        • Academic Project Funding
        • The Data Lab Community
  • Professional Development
        • Professional Development

          We’ll help you harness the power of data so you can innovate at work and also advance your career.

          Visit our Professional Development page

        • Workshops
        • Online Courses
        • Data Skills for Work Programme
        • The Data Lab Community
  • Students
        • Students

          We’ll help you learn about the power of data and gain real-world experience and career-focused qualifications.

          Visit our Students page

        • The Data Lab Academy
        • PhD
        • TDL Academy Placements
        • Scholarships
        • The Data Lab Community
  • Partner With Us
        • Partner With Us

          We work in partnership with companies to help them gain maximum benefit from the strategic use of data.

          Visit our Partner With Us page

        • Collaborate With Specialists
        • Partnerships
  • About Us
        • About Us

          We discover opportunities, connect people and ideas, develop knowledge and expertise and bring game-changing data projects to fruition.

          About Us

        • Our Team
        • Careers With Us
        • Academic Opportunities
        • The Data Lab Community
        • Case Studies
        • News & Podcasts
        • DataFest
        • Scottish AI Alliance
        • Contact us

Digital security and how not to do it

Tech blog 01/06/2020

Guest blog by Colin Gillespie, Data Scientist at Jumping Rivers

Colin was due to speak at DataTech20, which was unfortunately cancelled due to COVID-19. We’re delighted that he has put together this brilliant blog on digital security for us.

How not to do security

Digital security is everywhere. Unfortunately, bad digital security is also everywhere. Take Disney+, the godsend to all parents during this lock-down. When initially released in the USA, it was plagued with security issues. Essentially, hackers used username and password combinations that had already been compromised to access user accounts. This is commonly known as credential stuffing. Not a particularly sophisticated attack, but simple and clearly effective.

Television remote controlLast week I purchased a Roku, a device that allows your television to connect to the numerous online streaming services, including Disney+. As someone who takes security seriously, I made my Disney+ account secure with a unique thirty-five digit password – I use a password manager. However, to access Disney+ on the Roku I had to enter the thirty-five digit password using a tiny hand control, via a terrible interface. Just when things couldn’t get worse after the first twenty characters, Roku stopped displaying the entered password!

Poor design makes security hard

It’s this sort of poor design that makes security hard. Users try to do the right thing, but are ushered into insecure practices. Of course, if my account was compromised it would be “my” fault.

Amazon by comparison, asked you to log on to their site with a laptop/phone and enter a code on the TV. Easy, simple and secure. Users are nudged into doing the right thing.

Data Science

The difference with data scientists and other members of an organisation is that their job typically involves pulling together a variety of sensitive data sets and then reporting the results to relevant stakeholders. This may include a public facing website. Coupled with this is the need to work at the forefront of technology. Not a good mix from a security perspective.

Companies and organisations, are often, and rightly so, worried about security around their data assets. However, their solution is often to place barriers and obstacles that impede data scientists from doing their job.

At Jumping Rivers we provide online and onsite training around R and Python. Previously, we would send the client a list of R/Python packages that were required for training. At many organisations, installing software is painful. It requires multiple forms, time, emails, and signatures. As this process was so painful, we noticed that many organisations minimised this pain by simply installing all 12,000 CRAN R packages! That way they would avoid future form filling. The logic was this was a single request to IT, and so saved time in the future. There was also a bit of passing the buck – “IT said it was OK, so it’s no longer my responsibility”. While users should take some responsibility for this “solution”, IT should also take much of the blame. As they are actively hindering people’s ability to do their job, the obvious consequence is that users try to circumvent the barriers.

At Jumping Rivers we now avoid this situation, by providing a cloud solution for clients that requires no set-up and no bad security practices.

While this is an extreme example, it’s type is replicated at multiple organisations. Hire incredibly clever data scientists, stop them accessing the correct tools but still expect them to do their work. The result, workarounds – personal laptops for work, data on USB sticks, and logging on to unsecured wifi to download zip files. At one course I gave, the class would run over to Starbucks to install software at lunchtime as the office wifi blocked all zip files.

What could go wrong?

Bioconductor is a suite of R packages that are used for analysing genomic data sets. The sort of data that is very expensive to collect and almost certainly sensitive. The type of organisations that use Bioconductor are pharmaceutical companies, Government organisations, and Universities. Basically, anyone involved in serious medical research; most organisations looking at vaccines for Covid19 would be using Bioconductor.

To install Bioconductor, you run the following command

source(“https://www.bioconductor.org/bioclite.R”)

in R. This would download and run an R script. A few things to note about the above code:

  1. It uses https. This is good. But you can still have a secure conversation with the devil!
  2. Running this code involves complete trust in the Bioconductor team. You are giving the author of this script all of your user rights. Any files you can access, they can access. Any program you can run or install, they can run or install.

The first point around https, is well known. It’s not that https is insecure, it’s just that it’s often mistaken to mean that the content is secure. Indeed, Barclays got into trouble by the ASA in 2018 for making this very claim. So using https is a necessary, but not the only condition for security.

The second point around trust, at first glance, seems odd. Of course we trust Bioconductor, we are using their software after all. However, this assumes you actually install the software from Bioconductor! A few months ago, I made thirteen domain purchases. All misspellings of the name “bioconductor” for cost of around £100. For example, I had:

  • boconductor.org
  • biconductor.org
  • bioonductor.org
  • etc

I then monitored the web-logs to see if anyone tried to download an R script from my domain. On average, I had fifteen unique IP addresses per day, which included:

  • hits from all major pharmaceutical companies;
  • hits from all top ten world Universities;
  • hits from major government departments.

Remember, whenever anyone runs my script, this is equivalent to opening their laptop and giving an attacker full access. Depending on how an attacker was feeling, they could delete everyone on the laptop, or perhaps more nefariously, change a couple of values in an Excel spreadsheet, potentially causing millions of pounds worth of damage.

Where does the fault lie for the potential security breach? I would argue that it’s not just the users’ fault, as mistakes happen. Instead, fault lies partly with Bioconductor for encouraging this installation method (this has now been changed). Also, the fault lies with organisations who use Bioconductor. Why was this not installed securely site-wide, which would remove the need for anyone to install the software.

Getting the Basics Right

As organisations become more data driven, this in turn leads to a proliferation of different technologies, such as, Shiny dashboards, API ends, and Flask apps. This is normal, and by itself not an issue. However, it is all to easy to let weak security practices creep in. For example, whenever we at Jumping Rivers engage with a company, we have a standard check list of gotcha’s (and also solutions)

  • Who is in charge of updating packages, e.g. R or Python
  • How do you monitor old dashboards for potential security vulnerabilities
  • How are your API keys generated, stored and shared?
  • Do you use and enforce two factor authentication?
  • How do you monitor your cloud computing resources?

Digital security around data science seems to be in its infancy. However, with a little thought, it is possible to use standard security practices in other areas to tighten up the process.

Notes on the Bioconductor Study

  • The Bioconductor installation process was changed around 18 months ago. When you discover a vulnerability it’s good practice to give the organisation reasonable time to fix it.
  • On my fake bioconductor domains, I never returned an R script. I simply give a 404 (page not found) error message. This allowed me to demonstrate the potential for a security vulnerability, without actually compromising organisations. The latter would cross that line to being illegal.

Tags: digital security

Reader Interactions

Comments

  1. student says

    14/07/2023 at 11:23

    It`s such a complicated topic nowadays, thank you for your amazing work. I`m a student in the sphere and find your posts and info sometimes more informative for me rather than what I get in classes.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Innovate • Support • Grow • Respect

Get in touch

t: +44 (0) 131 651 4905

info@thedatalab.com

Follow us on social

  • Twitter
  • YouTube
  • Instagram
  • LinkedIn
  • TikTok

The Data Lab is part of the University of Edinburgh, a charitable body registered in Scotland with registration number SC005336.

  • Website Accessibility
  • Privacy Policy
  • Terms & Conditions

© 2023 The Data Lab

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsReject AllAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearSet by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent1 yearRecords the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_DPXX4XJSJ82 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_54851888_11 minuteSet by Google to distinguish users.
_gat_UA-54851888-11 minuteA variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au3 monthsProvided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
personalization_id2 yearsTwitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
CookieDurationDescription
cl-bypass-cache1 hourNo description
muc_ads2 yearsNo description
SAVE & ACCEPT
Powered by CookieYes Logo