What is Data Science?

Read Time:3 Minute, 46 Second

Before starting a journey, the unavoidable question is, where are we heading ultimately? Having a holistic view of the journey better prepares us to understand the hurdles that await us and keep our expectations aligned with reality.

Thanks to the advance in computer hardware which bring us more computational power, it enables processing and storing innumerable sizes of data. The advent of cloud technology even makes data more accessible. Because of that, the science of studying data, so-called “data science”, becomes feasible. Nowadays, more and more companies realize the value of data and undergo digital transformation, such as bringing their business models from offline to online, redesigning current workflow with new digital tools, and combining their products with IoT technology. These modifications allow companies to collect data from their users, and store in their database for future analysis. It helps understand users’ preferences to draw new business insights and marketing strategies.

To me, data science is:

“Data science is about combining mathematical models with domain knowledge to solve complex business problems with the use of technology.”

Now let’s look at how IBM defines Data Science:

Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning.

The Importance of Data Literacy

The volume of data has been experiencing exponential growth in the past 10 years, and people expect this trend to continue.

Volume of data created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025 (in zettabytes), by Statista

With this mass amount of data, if we cannot break it down and extract actionable insights, data will become abundant noise and go wasted. Stakeholders need to have a certain level of data literacy in their business context to make informed decisions.

As we are talking about data, there are four important attributes of data we should know:

Volume: The size of data that an organization has collected.

Velocity: The speed of data created and the corresponding speed for that data to be collected and analyzed.

Variety: The sources that data is collected from and the diversity of data types.

Veracity: The quality that dictates the accuracy, truthfulness, and applicability of data.

Data that is high volume, high velocity, high variety, and high veracity must be processed with advanced analytics tools and algorithms to reveal its in-depth value.

Data Science Career

I remember that at the time of my graduation, there wasn’t a program offered by my school (HKUST) called or associated with Data Science. The most relatable we could get was Computer Science. But now, my school has been offering the Bachelor of Science in Data Science and Technology (DSCT) program (since 2017). We can see that the demand for Data Science specialists is booming.

Here are the four common careers associated with Data Science and their nuances:

Data Scientist

Data scientists deal with both structured and unstructured data and use machine learning and statistical techniques to build models that make predictions.

Data Analyst

Data analyst is akin to data scientist; however, they will more likely be working with structured data and focusing more on data visualization to present their findings to stakeholders. In contrast to data scientists, who steer more toward modelling.

Machine Learning Engineer

Simply put, a machine learning engineer is a software engineer specializing in machine learning techniques. Unlike data scientists, machine learning engineers are more focused on the deployment part of a project. They implement algorithms based on the ML models developed by data scientists.

Data Engineer

Data engineers design and build systems for collecting and warehousing data at scale. They are the architects in the data science team.

Data science is a new career field which only come to exist recently. There is some overlapping between these roles, and the actual job duties of these positions may vary between different companies.

Source:

By: IBM Cloud Education. (n.d.). What is Data Science. IBM. Retrieved December 8, 2022, from https://www.ibm.com/cloud/learn/data-science-introduction

Published by Statista Research Department, & 8, S. (2022, September 8). Total Data Volume Worldwide 2010-2025. Statista. Retrieved December 8, 2022, from https://www.statista.com/statistics/871513/worldwide-data-created/

Program overview. Program Overview | HKUST BSc in Data Science and Technology. (n.d.). Retrieved December 8, 2022, from https://dsct.hkust.edu.hk/program-overview