
This article reveals the salary distribution of global data science positions! And analyze the impact of position, country, work experience, employment form, and company size on salary, and provide advice on job hunting and job-hopping Tips!
π‘ Author: Han Nobuko@ShowMeAI π data Analysis of actual combat series: https://www.showmeai.tech/tutorials/40 π AI Post & strategy series: https://www.showmeai.tech/tutorials/47 π Address of this article : https://www.showmeai.tech/article-detail/402 π’ sound Ming: All rights reserved, please contact the platform and the author for reprinting and indicate the source π’ Favorite ShowMe AI see more exciting content
π‘ Introduction

Data science is still gaining popularity in various fields such as Internet, healthcare, telecommunications, retail, sports, aviation, arts, etc. in πG lassdoor Data science jobs rank third on the list of the best jobs in America, with nearly 10,071 job openings for 2022.
In addition to the unique charm of data, the salary of data science-related positions has also attracted much attention. In this article, ShowMeAI The following questions will be analyzed based on the data:
- What are the highest paying jobs in data science?
- Which country has the highest salaries and the most opportunities?
- What is a typical salary range?
- How important is job level to a data scientist?
- Data Science, Full Time vs Freelance
- What are the highest paying jobs in data science?
- What are the highest paying jobs in data science on average?
- Minimum and maximum salaries for data science majors
- What is the size of the company hiring data science professionals?
- Is the salary related to the size of the company?
- What is the ratio of WFH (teleworking) to WFO?
- How do salaries for data science jobs grow every year?
- If someone is looking for a job related to data science, what would you recommend him to search for online?
- If you have a few years experience as an entry-level employee, what size company should you consider moving to?
π‘ Data description
The data set we used this time is πnumber According to the scientific job salary dataset, Everyone can pass S howMeAI The Baidu network disk address download.
π Actual combat data set download (Baidu network disk): Official account "ShowMeAI Research Center" replies "actual combat", or click here inside Get this article [ 37] Data scientist salary analysis and visualization based on pandasql and plotly "ds_salaries data set"
β ShowMeAI official GitHub: https://github.com/ShowMeAI-Hub
The dataset contains 11 columns, the corresponding names and meanings are as follows:
parameter | meaning |
---|---|
work_year | the year the salary was paid |
| experience_level : The experience level when paying salary |
| employment_type | employment_type |
| job_title | job title |
| salary | Total salary paid |
| salary_currency | Currency of salary paid |
| salary_in_usd | Normalized salary paid in USD |
| employee_residence | employee's primary country of residence |
| remote_ratio | Total amount of work done remotely |
| company_location | Country where the employer's principal office is located |
| company_size | Company size based on number of employees |

This analysis uses Pandas and SQL, welcome to read ShowMeAI The data analysis tutorial and corresponding tool cheat sheet articles, systematic learning and hands-on practice:
π figure Solving Data Analysis: From Getting Started to Mastering a Series of Tutorials
πEdit Programming Language Cheat Sheet | SQL Cheat Sheet
π number Pandas Cheat Sheet | Pandas Cheat Sheet
π number Matplotlib Cheat Sheet | Matplotlib Cheat Sheet
π‘ import tool library
We first import the tool library we need to use, we use pandas to read the data, and use Plotly and matplotlib for visualization. And we will use SQL for data analysis in this article, we use it here πp andasql Tool Library.
copy# For loading data import pandas as pd import numpy as np # For SQL queries import pandasql as ps # For ploting graph / Visualization import plotly.graph_objects as go import plotly.express as px from plotly.offline import iplot import plotly.figure_factory as ff import plotly.io as pio import seaborn as sns import matplotlib.pyplot as plt # To show graph below the code or on same notebook from plotly.offline import init_notebook_mode init_notebook_mode(connected=True) # To convert country code to country name import country_converter as coco import warnings warnings.filterwarnings('ignore')
π‘ Loading the dataset
The dataset we downloaded is in CSV format, so we can use the read_csv method to read our dataset.
copy# Loading data salaries = pd.read_csv('ds_salaries.csv')
To see the first five records, we can use the salaries.head() method.

Using pandasql to accomplish the same task is like this:
copy# Function query to execute SQL queries def query(query): return ps.sqldf(query) # Showing Top 5 rows of data query(""" SELECT * FROM salaries LIMIT 5 """)
output:

π‘ Data preprocessing
The first column "Unnamed: 0" in our data set is useless, we remove it before analysis:
copysalaries = salaries.drop('Unnamed: 0', axis = 1)
Let's look at the missing values ββin the dataset:
copysalaries.isna().sum()
output:
copywork_year 0 experience_level 0 employment_type 0 job_title 0 salary 0 salary_currency 0 salary_in_usd 0 employee_residence 0 remote_ratio 0 company_location 0 company_size 0 dtype: int64
We don't have any missing values ββin our dataset, so we don't need to do missing value handling, employee_residence and company_location use short country codes. We map replaced with the full name of the country for easier understanding:
copy# Converting countries code to country names salaries["employee_residence"] = coco.convert(names=salaries["employee_residence"], to="name") salaries["company_location"] = coco.convert(names=salaries["company_location"], to="name")
The experience_level in this data set represents different experience levels, using the following abbreviations:
- CN: Entry Level (entry level)
- ML: Mid level (intermediate)
- SE: Senior Level (advanced)
- EX: Expert Level (senior expert level)
For easier understanding, we also replace these abbreviations with full names.
copy# Replacing values in column - experience_level : salaries['experience_level'] = query("""SELECT REPLACE( REPLACE( REPLACE( REPLACE( experience_level, 'MI', 'Mid level'), 'SE', 'Senior Level'), 'EN', 'Entry Level'), 'EX', 'Expert Level') FROM salaries""")
In the same way, we also replace the full name of the work form
- FT: Full Time (full time)
- PT: Part Time (part-time)
- CT: Contract (contract system)
- FL: Freelance (freelance)
copy# Replacing values in column - experience_level : salaries['employment_type'] = query("""SELECT REPLACE( REPLACE( REPLACE( REPLACE( employment_type, 'PT', 'Part Time'), 'FT', 'Full Time'), 'FL', 'Freelance'), 'CT', 'Contract') FROM salaries""")
The company size field in the dataset is handled as follows:
- S: Small (small)
- M: Medium
- L: Large
copy# Replacing values in column - company_size : salaries['company_size'] = query("""SELECT REPLACE( REPLACE( REPLACE( company_size, 'M', 'Medium'), 'L', 'Large'), 'S', 'Small') FROM salaries""")
We also do some processing on the remote ratio field for better understanding
copy# Replacing values in column - remote_ratio : salaries['remote_ratio'] = query("""SELECT REPLACE( REPLACE( REPLACE( remote_ratio, '100', 'Fully Remote'), '50', 'Partially Remote'), '0', 'Non Remote Work') FROM salaries""")
This is the final output after preprocessing.

π‘ Data Analysis & Visualization
π¦ What are the highest paying jobs in data science?
copytop10_jobs = query(""" SELECT job_title, Count(*) AS job_count FROM salaries GROUP BY job_title ORDER BY job_count DESC LIMIT 10 """)
Let's draw a bar chart for a more intuitive understanding:
copydata = go.Bar(x = top10_jobs['job_title'], y = top10_jobs['job_count'], text = top10_jobs['job_count'], textposition = 'inside', textfont = dict(size = 12, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1)) layout = go.Layout(title = {'text': "<b>Top 10 Data Science Jobs</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Job Title</b>', tickmode = 'array'), yaxis = dict(title = '<b>Total</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Market Distribution of Data Science Jobs
copyfig = px.pie(top10_jobs, values='job_count', names='job_title', color_discrete_sequence = px.colors.qualitative.Alphabet) fig.update_layout(title = {'text': "<b>Distribution of job positions</b>", 'x':0.5, 'xanchor': 'center'}, width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Countries with the Most Data Science Jobs
copytop10_com_loc = query(""" SELECT company_location AS company, Count(*) AS job_count FROM salaries GROUP BY company ORDER BY job_count DESC LIMIT 10 """) data = go.Bar(x = top10_com_loc['company'], y = top10_com_loc['job_count'], textfont = dict(size = 12, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1)) layout = go.Layout(title = {'text': "<b>Top 10 Data Science Countries</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Countries</b>', tickmode = 'array'), yaxis = dict(title = '<b>Total</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

From the graph above, we can see that the United States has the most job opportunities in data science. Now we look at salaries around the world. You can continue to run the code and see the visualization results.
copydf = salaries df["company_country"] = coco.convert(names = salaries["company_location"], to = 'name_short') temp_df = df.groupby('company_country')['salary_in_usd'].sum().reset_index() temp_df['salary_scale'] = np.log10(df['salary_in_usd']) fig = px.choropleth(temp_df, locationmode = 'country names', locations = "company_country", color = "salary_scale", hover_name = "company_country", hover_data = temp_df[['salary_in_usd']], color_continuous_scale = 'Jet', ) fig.update_layout(title={'text':'<b>Salaries across the World</b>', 'xanchor': 'center','x':0.5}) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()
π¦ Average Salary (Currency Based)
copydf = salaries[['salary_currency','salary_in_usd']].groupby(['salary_currency'], as_index = False).mean().set_index('salary_currency').reset_index().sort_values('salary_in_usd', ascending = False) #Selecting top 14 df = df.iloc[:14] fig = px.bar(df, x = 'salary_currency', y = 'salary_in_usd', color = 'salary_currency', color_discrete_sequence = px.colors.qualitative.Safe, ) fig.update_layout(title={'text':'<b>Average salary as a function of currency</b>', 'xanchor': 'center','x':0.5}, xaxis_title = '<b>Currency</b>', yaxis_title = '<b>Mean Salary</b>') fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

People earn the most in US dollars, followed by Swiss francs and Singapore dollars.
copydf = salaries[['company_country','salary_in_usd']].groupby(['company_country'], as_index = False).mean().set_index('company_country').reset_index().sort_values('salary_in_usd', ascending = False) #Selecting top 14 df = df.iloc[:14] fig = px.bar(df, x = 'company_country', y = 'salary_in_usd', color = 'company_country', color_discrete_sequence = px.colors.qualitative.Dark2, ) fig.update_layout(title = {'text': "<b>Average salary as a function of company location</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Company Location</b>', tickmode = 'array'), yaxis = dict(title = '<b>Mean Salary</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Data Science Work Experience Level Distribution
copyjob_exp = query(""" SELECT experience_level, Count(*) AS job_count FROM salaries GROUP BY experience_level ORDER BY job_count ASC """) data = go.Bar(x = job_exp['job_count'], y = job_exp['experience_level'], orientation = 'h', text = job_exp['job_count'], marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2)) layout = go.Layout(title = {'text': "<b>Jobs on Experience Levels</b>", 'x':0.5, 'xanchor':'center'}, xaxis = dict(title='<b>Total</b>', tickmode = 'array'), yaxis = dict(title='<b>Experience lvl</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

As you can see from the graph above, most data science is at an advanced level, with very few at the expert level.
π¦ Data Science Jobs Employment Type Distribution
copyjob_emp = query(""" SELECT employment_type, COUNT(*) AS job_count FROM salaries GROUP BY employment_type ORDER BY job_count ASC """) data = go.Bar(x = job_emp['job_count'], y = job_emp['employment_type'], orientation ='h',text = job_emp['job_count'], textposition ='outside', marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2)) layout = go.Layout(title = {'text': "<b>Jobs on Employment Type</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title='<b>Total</b>', tickmode = 'array'), yaxis =dict(title='<b>Emp Type lvl</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

From the graph above, we can see that the majority of data scientists work full-time, with fewer contract workers and freelancers
π¦ Data Science Jobs Trends
copyjob_year = query(""" SELECT work_year, COUNT(*) AS 'job count' FROM salaries GROUP BY work_year ORDER BY 'job count' DESC """) data = go.Scatter(x = job_year['work_year'], y = job_year['job count'], marker = dict(size = 20, line_width = 1.5, line_color = 'white', color = px.colors.qualitative.Alphabet), line = dict(color = '#ED7D31', width = 4), mode = 'lines+markers') layout = go.Layout(title = {'text' : "<b><i>Data Science jobs Growth (2020 to 2022)</i></b>", 'x' : 0.5, 'xanchor' : 'center'}, xaxis = dict(title = '<b>Year</b>'), yaxis = dict(title = '<b>Jobs</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_xaxes(tickvals = ['2020','2021','2022']) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Data Science Job Salary Distribution
copysalary_usd = query(""" SELECT salary_in_usd FROM salaries """) import matplotlib.pyplot as plt plt.figure(figsize = (20, 8)) sns.set(rc = {'axes.facecolor' : '#f1e7d2', 'figure.facecolor' : '#f1e7d2'}) p = sns.histplot(salary_usd["salary_in_usd"], kde = True, alpha = 1, fill = True, edgecolor = 'black', linewidth = 1) p.axes.lines[0].set_color("orange") plt.title("Data Science Salary Distribution \n", fontsize = 25) plt.xlabel("Salary", fontsize = 18) plt.ylabel("Count", fontsize = 18) plt.show()

π¦ Top 10 Highest Paying Data Science Jobs
copysalary_hi10 = query(""" SELECT job_title, MAX(salary_in_usd) AS salary FROM salaries GROUP BY salary ORDER BY salary DESC LIMIT 10 """) data = go.Bar(x = salary_hi10['salary'], y = salary_hi10['job_title'], orientation = 'h', text = salary_hi10['salary'], textposition = 'inside', insidetextanchor = 'middle', textfont = dict(size = 13, color = 'black'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1)) layout = go.Layout(title = {'text': "<b>Top 10 Highest paid Data Science Jobs</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>salary</b>', tickmode = 'array'), yaxis = dict(title = '<b>Job Title</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

Principal Data Engineer is a high paying job in data science.
π¦ Average salary and ranking of different positions
copysalary_av10 = query(""" SELECT job_title, ROUND(AVG(salary_in_usd)) AS salary FROM salaries GROUP BY job_title ORDER BY salary DESC LIMIT 10 """) data = go.Bar(x = salary_av10['salary'], y = salary_av10['job_title'], orientation = 'h', text = salary_av10['salary'], textposition = 'inside', insidetextanchor = 'middle', textfont = dict(size = 13, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2)) layout = go.Layout(title = {'text': "<b>Top 10 Average paid Data Science Jobs</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>salary</b>', tickmode = 'array'), yaxis = dict(title = '<b>Job Title</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Data Science Salary Trends
copysalary_year = query(""" SELECT ROUND(AVG(salary_in_usd)) AS salary, work_year AS year FROM salaries GROUP BY year ORDER BY salary DESC """) data = go.Scatter(x = salary_year['year'], y = salary_year['salary'], marker = dict(size = 20, line_width = 1.5, line_color = 'black', color = '#ED7D31'), line = dict(color = 'black', width = 4), mode = 'lines+markers') layout = go.Layout(title = {'text' : "<b>Data Science Salary Growth (2020 to 2022) </b>", 'x' : 0.5, 'xanchor' : 'center'}, xaxis = dict(title = '<b>Year</b>'), yaxis = dict(title = '<b>Salary</b>'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_xaxes(tickvals = ['2020','2021','2022']) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Experience Level & Salary
copysalary_exp = query(""" SELECT experience_level AS 'Experience Level', salary_in_usd AS Salary FROM salaries """) fig = px.violin(salary_exp, x = 'Experience Level', y = 'Salary', color = 'Experience Level', box = True) fig.update_layout(title = {'text': "<b>Salary on Experience Level</b>", 'xanchor': 'center','x':0.5}, xaxis = dict(title = '<b>Experience level</b>'), yaxis = dict(title = '<b>salary</b>', ticktext = [-300000, 0, 100000, 200000, 300000, 400000, 500000, 600000, 700000]), width = 900, height = 600) fig.update_layout(paper_bgcolor= '#f1e7d2', plot_bgcolor = '#f1e7d2', showlegend = False) fig.show()

π¦ Salary trends by experience level
copytmp_df = salaries.groupby(['work_year', 'experience_level']).median() tmp_df.reset_index(inplace = True) fig = px.line(tmp_df, x='work_year', y='salary_in_usd', color='experience_level', symbol="experience_level") fig.update_layout(title = {'text': "<b>Median Salary Trend By Experience Level</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Working Year</b>', tickvals = [2020, 2021, 2022], tickmode = 'array'), yaxis = dict(title = '<b>Salary</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

Observations 1. During the COVID-19 pandemic (2020-2021), specialist-level salaries are very high, but partially declining. 2. After 2021, the salaries of experts and senior professional titles will increase.
π¦ Year & Salary Distribution
copyyear_gp = salaries.groupby('work_year') hist_data = [year_gp.get_group(2020)['salary_in_usd'], year_gp.get_group(2021)['salary_in_usd'], year_gp.get_group(2022)['salary_in_usd']] group_labels = ['2020', '2021', '2022'] fig = ff.create_distplot(hist_data, group_labels, show_hist = False) fig.update_layout(title = {'text': "<b>Salary Distribution By Working Year</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Salary</b>'), yaxis = dict(title = '<b>Kernel Density</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Employment Type & Salary
copysalary_emp = query(""" SELECT employment_type AS 'Employment Type', salary_in_usd AS Salary FROM salaries """) fig = px.box(salary_emp,x='Employment Type',y='Salary', color = 'Employment Type') fig.update_layout(title = {'text': "<b>Salary by Employment Type</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Employment Type</b>'), yaxis = dict(title = '<b>Salary</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Company size distribution
copycomp_size = query(""" SELECT company_size, COUNT(*) AS count FROM salaries GROUP BY company_size """) import plotly.graph_objects as go data = go.Pie(labels = comp_size['company_size'], values = comp_size['count'].values, hoverinfo = 'label', hole = 0.5, textfont_size = 16, textposition = 'auto') fig = go.Figure(data = data) fig.update_layout(title = {'text': "<b>Company Size</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b></b>'), yaxis = dict(title = '<b></b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Experience level ratio by company size
copydf = salaries.groupby(['company_size', 'experience_level']).size() comp_s = np.round(df['Small'].values / df['Small'].values.sum(),2) comp_m = np.round(df['Medium'].values / df['Medium'].values.sum(),2) comp_l = np.round(df['Large'].values / df['Large'].values.sum(),2) fig = go.Figure() categories = ['Entry Level', 'Expert Level','Mid level','Senior Level'] fig.add_trace(go.Scatterpolar( r = comp_s, theta = categories, fill = 'toself', name = 'Company Size S')) fig.add_trace(go.Scatterpolar( r = comp_m, theta = categories, fill = 'toself', name = 'Company Size M')) fig.add_trace(go.Scatterpolar( r = comp_l, theta = categories, fill = 'toself', name = 'Company Size L')) fig.update_layout( polar = dict( radialaxis = dict(range = [0, 0.6])), showlegend = True, ) fig.update_layout(title = {'text': "<b>Proportion of Experience Level In Different Company Sizes</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b></b>'), yaxis = dict(title = '<b></b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Different company size & job salary
copysalary_size = query(""" SELECT company_size AS 'Company size', salary_in_usd AS Salary FROM salaries """) fig = px.box(salary_size, x='Company size', y = 'Salary', color = 'Company size') fig.update_layout(title = {'text': "<b>Salary by Company size</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Company size</b>'), yaxis = dict(title = '<b>Salary</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Ratio of WFH (Teleworking) and WFO
copyrem_type = query(""" SELECT remote_ratio, COUNT(*) AS total FROM salaries GROUP BY remote_ratio """) data = go.Pie(labels = rem_type['remote_ratio'], values = rem_type['total'].values, hoverinfo = 'label', hole = 0.4, textfont_size = 18, textposition = 'auto') fig = go.Figure(data = data) fig.update_layout(title = {'text': "<b>Remote Ratio</b>", 'x':0.5, 'xanchor': 'center'}, width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Salary is affected by Remote Type
copysalary_remote = query(""" SELECT remote_ratio AS 'Remote type', salary_in_usd AS Salary From salaries """) fig = px.box(salary_remote, x = 'Remote type', y = 'Salary', color = 'Remote type') fig.update_layout(title = {'text': "<b>Salary by Remote Type</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Remote type</b>'), yaxis = dict(title = '<b>Salary</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π¦ Different experience levels & remote ratios
copyexp_remote = salaries.groupby(['experience_level', 'remote_ratio']).count() exp_remote.reset_index(inplace = True) fig = px.histogram(exp_remote, x = 'experience_level', y = 'work_year', color = 'remote_ratio', barmode = 'group', text_auto = True) fig.update_layout(title = {'text': "<b>Respondent Count In Different Experience Level Based on Remote Ratio</b>", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = '<b>Experience Level</b>'), yaxis = dict(title = '<b>Number of Respondents</b>'), width = 900, height = 600) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show()

π‘ Analysis conclusion
- The top 3 jobs in data science are data scientist, data engineer, and data analyst.
- Data science jobs are becoming more and more popular. The proportion of employees will increase from 11.9% in 2020 to 52.4% in 2022.
- The United States is the country with the most data science companies.
- The IQR of the salary distribution is between 62.7k and 150k.
- Among data science employees, most are at the senior level, with fewer at the expert level.
- Most data science employees work full-time, with few contractors and freelancers.
- Lead data engineer is the highest paying data science job.
- The minimum salary for data science (entry-level experience) is $4000, and the maximum salary for data science with expert-level experience is $600,000.
- Company composition: 53.7% mid-sized companies, 32.6% large companies, 13.7% small data science companies.
- Salaries are also affected by company size, with larger companies paying higher salaries.
- 62.8% of data science jobs were fully remote, 20.9% were non-remote, and 16.3% were partially remote.
- Data science salaries grow with time and experience.
References
- π Glassdoor
- π pandasql
- π Data Science Job Salary Dataset (Kaggle)
- π Graphical data analysis: from entry to master series of tutorials: h ttps://www.showmeai.tech/tutorials/33
- π Programming Language Cheat Sheet | SQL Cheat Sheet: h ttps://www.showmeai.tech/article-detail/99
- π Data Science Toolkit Cheat Sheet | Pandas Cheat Sheet: h ttps://www.showmeai.tech/article-detail/101
- π Data Science Tool Library Cheat Sheet | Matplotlib Cheat Sheet: h ttps://www.showmeai.tech/article-detail/103