Premium Content

PYTHON BASICS

Learn the fundamental concepts of Python programming

Beginner

Module 1: Python Fundamentals

Python is a versatile programming language widely used in data science for its simplicity and powerful libraries. In this module, you'll learn the basic syntax, data types, and control structures that form the foundation of Python programming.

Try It Yourself:

Python

# Python Basics for Data Science
print("Welcome to Data Science with Python!")

# Variables and Data Types
name = "Data Analyst"
age = 28
height = 5.9
is_programmer = True

print(f"Name: {name}")
print(f"Age: {age}")
print(f"Height: {height}")
print(f"Is Programmer: {is_programmer}")

# Lists - Ordered, mutable collection
fruits = ["apple", "banana", "cherry", "date"]
print(f"\nFruits: {fruits}")
print(f"First fruit: {fruits[0]}")

# Dictionaries - Key-value pairs
person = {
    "name": "Alice",
    "age": 30,
    "city": "Data City"
}
print(f"\nPerson: {person}")
print(f"City: {person['city']}")

# Control Structures
print("\n--- Control Structures ---")
for fruit in fruits:
    print(f"I like {fruit}")

# Conditional statements
temperature = 25
if temperature > 30:
    print("It's hot outside!")
elif temperature > 20:
    print("Perfect weather for data analysis!")
else:
    print("It's cool outside!")

Live Preview

Key Points:

Python uses simple, readable syntax with indentation for code blocks
Variables don't need explicit declaration - Python infers the type
Common data types include integers, floats, strings, booleans, lists, and dictionaries
Lists are ordered, mutable collections that can hold different data types
Dictionaries store data as key-value pairs for efficient lookup
Control structures (if/else, for loops) help manage program flow
f-strings provide a convenient way to format strings with variables

Premium Content

DATAFRAMES & SERIES

Learn to work with Pandas for data manipulation

Beginner

Module 2: Pandas DataFrames and Series

Pandas is the most popular Python library for data manipulation and analysis. It provides powerful data structures like DataFrames (2D tables) and Series (1D arrays) that make working with structured data intuitive and efficient.

Try It Yourself:

Python

import pandas as pd

# Creating a Series (1D array)
print("--- Pandas Series ---")
temperatures = pd.Series([22, 25, 19, 30, 27], 
                         index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'])
print(temperatures)
print(f"Wednesday temperature: {temperatures['Wed']}Â°C")

# Creating a DataFrame (2D table)
print("\n--- Pandas DataFrame ---")
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'London', 'Tokyo', 'Paris'],
    'Salary': [50000, 60000, 70000, 55000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Basic DataFrame operations
print(f"\nDataFrame shape: {df.shape}")
print(f"Column names: {list(df.columns)}")
print(f"\nFirst 2 rows:")
print(df.head(2))

# Filtering data
print("\n--- Filtering Data ---")
high_earners = df[df['Salary'] > 55000]
print("People earning more than 55,000:")
print(high_earners)

# Adding a new column
df['Bonus'] = df['Salary'] * 0.1
print("\nDataFrame with Bonus column:")
print(df)

# Basic statistics
print("\n--- Basic Statistics ---")
print(f"Average salary: ${df['Salary'].mean():.2f}")
print(f"Maximum age: {df['Age'].max()}")

Live Preview

Key Points:

Pandas Series are one-dimensional labeled arrays that can hold any data type
DataFrames are two-dimensional labeled data structures with columns of potentially different types
Use pd.DataFrame() to create DataFrames from dictionaries, lists, or other data sources
The head() method displays the first few rows of a DataFrame
You can filter DataFrames using boolean conditions
New columns can be added by assigning values to a new column name
Pandas provides many built-in methods for statistical analysis (mean(), max(), etc.)
DataFrames automatically align data based on index labels

Premium Content

DATA VISUALIZATION

Learn to create insightful charts and graphs

Intermediate

Module 3: Data Visualization with Matplotlib

Data visualization is crucial for understanding patterns, trends, and relationships in data. Matplotlib is Python's primary plotting library that provides a flexible foundation for creating various types of charts and graphs.

Try It Yourself:

Python

import matplotlib.pyplot as plt
import numpy as np

# Sample data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [120, 150, 130, 170, 190, 200]
expenses = [80, 90, 85, 100, 110, 120]

# Create a figure with subplots
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 4))

# Line chart
ax1.plot(months, sales, marker='o', linewidth=2, label='Sales')
ax1.plot(months, expenses, marker='s', linewidth=2, label='Expenses')
ax1.set_title('Sales vs Expenses')
ax1.set_xlabel('Months')
ax1.set_ylabel('Amount ($)')
ax1.legend()
ax1.grid(True, linestyle='--', alpha=0.7)

# Bar chart
categories = ['Product A', 'Product B', 'Product C']
revenue = [45, 30, 25]
ax2.bar(categories, revenue, color=['#04fc42', '#03c734', '#02a126'])
ax2.set_title('Product Revenue')
ax2.set_ylabel('Revenue (%)')

# Pie chart
ax3.pie(revenue, labels=categories, autopct='%1.1f%%', startangle=90)
ax3.set_title('Revenue Distribution')

plt.tight_layout()
plt.show()

# Additional visualization example
print("\n--- Statistical Visualization ---")
# Generate sample data
np.random.seed(42)
data1 = np.random.normal(50, 15, 100)
data2 = np.random.normal(70, 10, 100)

fig2, (ax4, ax5) = plt.subplots(1, 2, figsize=(12, 4))

# Histogram
ax4.hist(data1, bins=15, alpha=0.7, label='Dataset 1')
ax4.hist(data2, bins=15, alpha=0.7, label='Dataset 2')
ax4.set_title('Distribution Comparison')
ax4.set_xlabel('Values')
ax4.set_ylabel('Frequency')
ax4.legend()

# Box plot
ax5.boxplot([data1, data2], labels=['Dataset 1', 'Dataset 2'])
ax5.set_title('Box Plot Comparison')

plt.tight_layout()
plt.show()

Live Preview

Key Points:

Matplotlib is the foundational plotting library for Python
Use plt.subplots() to create multiple charts in one figure
Line charts are ideal for showing trends over time
Bar charts effectively compare categorical data
Pie charts show proportional relationships between categories
Histograms display the distribution of numerical data
Box plots summarize data distribution through quartiles
Always label your axes and provide clear titles for readability
Use plt.tight_layout() to automatically adjust subplot parameters

Premium Content

NUMPY ARRAYS

Learn numerical computing with NumPy

Intermediate

Module 4: Numerical Computing with NumPy

NumPy is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Try It Yourself:

Python

import numpy as np

print("--- NumPy Array Basics ---")
# Creating arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print(f"1D Array: {arr1}")
print(f"2D Array:\n{arr2}")
print(f"Shape of 2D array: {arr2.shape}")
print(f"Data type: {arr2.dtype}")

# Array creation functions
print("\n--- Special Arrays ---")
zeros_arr = np.zeros((3, 3))
ones_arr = np.ones((2, 4))
range_arr = np.arange(0, 10, 2)  # Start, stop, step
random_arr = np.random.rand(3, 3)  # 3x3 random values

print(f"Zeros array:\n{zeros_arr}")
print(f"Ones array:\n{ones_arr}")
print(f"Range array: {range_arr}")
print(f"Random array:\n{random_arr}")

# Array operations
print("\n--- Array Operations ---")
a = np.array([10, 20, 30, 40])
b = np.array([2, 3, 4, 5])

print(f"Array a: {a}")
print(f"Array b: {b}")
print(f"a + b = {a + b}")
print(f"a * b = {a * b}")
print(f"a / b = {a / b}")
print(f"a ** 2 = {a ** 2}")

# Statistical operations
print("\n--- Statistical Operations ---")
data = np.random.normal(50, 10, 100)  # Normal distribution

print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Standard Deviation: {np.std(data):.2f}")
print(f"Minimum: {np.min(data):.2f}")
print(f"Maximum: {np.max(data):.2f}")

# Array indexing and slicing
print("\n--- Array Indexing and Slicing ---")
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"Original matrix:\n{matrix}")
print(f"Element at [1,2]: {matrix[1, 2]}")
print(f"First row: {matrix[0, :]}")
print(f"Second column: {matrix[:, 1]}")
print(f"Submatrix (first 2 rows, last 2 columns):\n{matrix[:2, 1:]}")

Live Preview

Key Points:

NumPy arrays are more efficient than Python lists for numerical operations
Arrays can have multiple dimensions (1D, 2D, 3D, etc.)
Use np.array() to create arrays from Python lists
np.zeros(), np.ones(), and np.arange() create arrays with specific patterns
NumPy supports element-wise operations on arrays (no need for loops)
Arrays have a shape attribute that describes their dimensions
Indexing and slicing work similarly to Python lists but with multiple dimensions
NumPy provides extensive mathematical and statistical functions
NumPy arrays are the foundation for many other data science libraries

Premium Content

MACHINE LEARNING

Introduction to machine learning algorithms

Advanced

Module 5: Machine Learning Fundamentals

Machine learning enables computers to learn from data without being explicitly programmed. This module introduces key concepts and algorithms using scikit-learn, Python's premier machine learning library.

Try It Yourself:

Python

# Machine Learning Example with Scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import matplotlib.pyplot as plt

print("--- Linear Regression Example ---")
# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)  # Feature
y = 4 + 3 * X + np.random.randn(100, 1)  # Target with noise

print(f"Feature shape: {X.shape}")
print(f"Target shape: {y.shape}")

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Model Coefficients: {model.coef_[0][0]:.2f}")
print(f"Model Intercept: {model.intercept_[0]:.2f}")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")

# Visualization
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual data')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression line')
plt.xlabel('Feature (X)')
plt.ylabel('Target (y)')
plt.title('Linear Regression Results')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

# Classification Example
print("\n--- Classification Example ---")
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Generate classification data
X_clf, y_clf = make_classification(n_samples=100, n_features=2, n_redundant=0, 
                                   n_informative=2, n_clusters_per_class=1, 
                                   random_state=42)

# Split data
X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
    X_clf, y_clf, test_size=0.3, random_state=42)

# Train classifier
clf = LogisticRegression()
clf.fit(X_train_clf, y_train_clf)

# Predictions
y_pred_clf = clf.predict(X_test_clf)
accuracy = accuracy_score(y_test_clf, y_pred_clf)

print(f"Classification Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(confusion_matrix(y_test_clf, y_pred_clf))

Live Preview

Key Points:

Machine learning involves training algorithms to make predictions or decisions
Scikit-learn provides consistent APIs for various ML algorithms
Always split data into training and testing sets to evaluate model performance
Linear regression predicts continuous values based on input features
Logistic regression is used for classification problems (binary outcomes)
Mean squared error measures the average squared difference between predicted and actual values
R-squared score indicates how well the model explains the variance in the data
Accuracy score measures classification performance
Confusion matrix shows detailed classification results

Premium Content

STATISTICAL ANALYSIS

Learn statistical methods for data analysis

Intermediate

Module 6: Statistical Analysis with Python

Statistical analysis is fundamental to data science, helping us understand data distributions, relationships, and make inferences. This module covers essential statistical concepts and their implementation in Python.

Try It Yourself:

Python

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

print("--- Descriptive Statistics ---")
# Generate sample data
np.random.seed(42)
data = np.random.normal(100, 15, 1000)  # Normal distribution

print(f"Sample size: {len(data)}")
print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Standard Deviation: {np.std(data):.2f}")
print(f"Variance: {np.var(data):.2f}")
print(f"Range: {np.ptp(data):.2f}")  # Peak-to-peak (max-min)

# Percentiles
print(f"25th Percentile: {np.percentile(data, 25):.2f}")
print(f"75th Percentile: {np.percentile(data, 75):.2f}")

# Hypothesis Testing
print("\n--- Hypothesis Testing ---")
# Two sample t-test
group1 = np.random.normal(100, 15, 50)
group2 = np.random.normal(105, 15, 50)

t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("Significant difference between groups (p < 0.05)")
else:
    print("No significant difference between groups (p >= 0.05)")

# Correlation Analysis
print("\n--- Correlation Analysis ---")
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 5, 4, 5, 7, 8, 9, 10, 12])

correlation = np.corrcoef(x, y)[0, 1]
print(f"Correlation coefficient: {correlation:.4f}")

# Probability Distributions
print("\n--- Probability Distributions ---")
# Normal distribution
x_normal = np.linspace(-4, 4, 100)
y_normal = stats.norm.pdf(x_normal)

plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(x_normal, y_normal)
plt.title('Normal Distribution')
plt.xlabel('x')
plt.ylabel('Probability Density')

# Binomial distribution
n, p = 10, 0.5
x_binom = np.arange(0, n+1)
y_binom = stats.binom.pmf(x_binom, n, p)

plt.subplot(1, 3, 2)
plt.bar(x_binom, y_binom)
plt.title('Binomial Distribution (n=10, p=0.5)')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')

# Poisson distribution
lambda_param = 3
x_poisson = np.arange(0, 15)
y_poisson = stats.poisson.pmf(x_poisson, lambda_param)

plt.subplot(1, 3, 3)
plt.bar(x_poisson, y_poisson)
plt.title('Poisson Distribution (Î»=3)')
plt.xlabel('Number of Events')
plt.ylabel('Probability')

plt.tight_layout()
plt.show()

# Confidence Intervals
print("\n--- Confidence Intervals ---")
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)  # Sample standard deviation
n = len(data)

# 95% confidence interval
ci_low, ci_high = stats.norm.interval(0.95, loc=sample_mean, scale=sample_std/np.sqrt(n))
print(f"95% Confidence Interval: ({ci_low:.2f}, {ci_high:.2f})")

Live Preview

Key Points:

Descriptive statistics summarize and describe data characteristics
Hypothesis testing helps determine if observed differences are statistically significant
Correlation measures the relationship between two variables (-1 to 1)
Probability distributions describe how values are distributed
The normal distribution is fundamental in statistics (bell curve)
Binomial distribution models binary outcomes (success/failure)
Poisson distribution models rare event counts
Confidence intervals estimate the range where a population parameter likely falls
A p-value < 0.05 typically indicates statistical significance

PRACTICE PLAYGROUND

Experiment with what you've learned

Your Data Science Playground: Try Your Own Code

Use this space to experiment with everything you've learned. Try combining different data science concepts and see the results in real-time! This is your sandbox to practice and explore.

Python

# Data Science Playground
# Try your own code here!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print("Welcome to the Data Science Playground!")
print("Try creating your own data science projects below.")

# Example: Create a simple dataset
data = {
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'Sales': [120, 150, 130, 170, 190, 200],
    'Expenses': [80, 90, 85, 100, 110, 120]
}

df = pd.DataFrame(data)
df['Profit'] = df['Sales'] - df['Expenses']

print("\nSample Business Data:")
print(df)

# Basic analysis
print(f"\nTotal Sales: ${df['Sales'].sum()}")
print(f"Average Profit: ${df['Profit'].mean():.2f}")
print(f"Best Month: {df.loc[df['Profit'].idxmax(), 'Month']}")

# Simple visualization
plt.figure(figsize=(10, 5))
plt.plot(df['Month'], df['Sales'], marker='o', label='Sales')
plt.plot(df['Month'], df['Expenses'], marker='s', label='Expenses')
plt.plot(df['Month'], df['Profit'], marker='^', label='Profit')
plt.title('Business Performance')
plt.xlabel('Month')
plt.ylabel('Amount ($)')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()

print("\nNow try creating your own data science project!")
print("Ideas:")
print("- Analyze a dataset of your choice")
print("- Build a predictive model")
print("- Create custom visualizations")
print("- Perform statistical analysis")

Live Preview

Challenge Exercises:

Create a dataset analyzing COVID-19 trends with:
- Daily cases and deaths
- Moving averages
- Growth rate calculations
Build a customer segmentation model using:
- K-means clustering
- Principal Component Analysis (PCA)
- Visualization of clusters
Analyze stock market data with:
- Price trends and volatility
- Correlation between different stocks
- Simple trading strategy backtesting
Create a sentiment analysis project:
- Text preprocessing
- Feature extraction
- Classification model training
Build a time series forecasting model:
- Data decomposition (trend, seasonality)
- ARIMA modeling
- Forecast evaluation

Tips & Resources:

Use pandas for data manipulation and cleaning
Matplotlib and Seaborn for data visualization
Scikit-learn for machine learning algorithms
NumPy for numerical computations
Always validate your models with test data
Document your analysis process and findings

PYTHON BASICS

Learn the fundamental concepts of Python programming

Module 1: Python Fundamentals

Try It Yourself:

Key Points:

DATAFRAMES & SERIES

Learn to work with Pandas for data manipulation

Module 2: Pandas DataFrames and Series

Try It Yourself:

Key Points:

DATA VISUALIZATION

Learn to create insightful charts and graphs

Module 3: Data Visualization with Matplotlib

Try It Yourself:

Key Points:

NUMPY ARRAYS

Learn numerical computing with NumPy

Module 4: Numerical Computing with NumPy

Try It Yourself:

Key Points:

MACHINE LEARNING

Introduction to machine learning algorithms

Module 5: Machine Learning Fundamentals

Try It Yourself:

Key Points:

STATISTICAL ANALYSIS

Learn statistical methods for data analysis

Module 6: Statistical Analysis with Python

Try It Yourself:

Key Points:

PRACTICE PLAYGROUND

Experiment with what you've learned

Your Data Science Playground: Try Your Own Code

Challenge Exercises:

Tips & Resources:

Unlock Premium SkillMynte Content

Premium Courses

Certification

Mentorship

Monthly

Annual Best Value

Lifetime