Hey! This post is about my introduction to the world of Julia. I took this challenge of learning Julia and making something in it. Since Julia is pretty similar to Python, I made a hypothesis. That is can I learn julia and be up and running with something in two days? What I realised is, if you're from a python background and have some expereince in it, then learning Julia is going to be fun and breezy for you. So, here I am after my two day rendezvous with Julia.

So, what I used to learn Julia? I used resources from julia academy

What did I implement? I decided to go for one the resources I learnt deep learning from: Neural Networks and Deep Learning

Impemented the Julia version of Week 2 assignment of Neural Networks and Deep Learning course.

I hope it's useful to you. It was a lot of fun and I am in love with Julia ❤

Let's begin!

using Random
using Plots
using HDF5
using Statistics
using Base.Iterators

Load dataset

  1. There are two files: train_catvnoncat.h5 & test_catvnoncat.h5
  2. According to our notation, X is of shape (num_features, num_examples) & y is a row vector of shape (1, num_examples).
  3. We write a function load_dataset() which:
    • Takes in HDF5 files
    • Converts them into Array{Float64, 2} arrays.
    • Reshapes them according to our notation & returns X_train, y_train, X_test, y_test
function load_dataset(train_file::String, test_file::String)
    
    X_train = convert(Array{Float64, 4}, h5read(train_file, "train_set_x"))
    y_train = convert(Array{Float64, 1}, h5read(train_file, "train_set_y"))
    
    X_test = convert(Array{Float64, 4}, h5read(test_file, "test_set_x"))
    y_test = convert(Array{Float64, 1}, h5read(test_file, "test_set_y"))
    
    num_features_train_X = size(X_train, 1) * size(X_train, 2) * size(X_train, 3)
    num_features_test_X = size(X_test, 1) * size(X_test, 2) * size(X_test, 3)
    
    X_train = reshape(X_train, (num_features_train_X, size(X_train, 4)))
    y_train = reshape(y_train, (1, size(y_train, 1)))
    
    X_test = reshape(X_test, (num_features_test_X, size(X_test, 4)))
    y_test = reshape(y_test, (1, size(y_test, 1)))
    
    X_train, y_train, X_test, y_test
    
end
load_dataset (generic function with 1 method)
X_train, y_train, X_test, y_test = load_dataset("train_catvnoncat.h5", "test_catvnoncat.h5");
@time size(X_train), size(y_train), size(X_test), size(y_test)
  0.002312 seconds (29 allocations: 2.078 KiB)
((12288, 209), (1, 209), (12288, 50), (1, 50))

Normalization

X_train, X_test= X_train/255, X_test/255;
@time size(X_train), size(y_train), size(X_test), size(y_test)
  0.000012 seconds (5 allocations: 208 bytes)
((12288, 209), (1, 209), (12288, 50), (1, 50))

Mathematical expression of the algorithm:

For one example $x^{(i)}$: $$z^{(i)} = w^T x^{(i)} + b \tag{1}$$ $$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ $$ \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples: $$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

Sigmoid

Applies sigmoid to the vector

function σ(z) 
    """
    Compute the sigmoid of z
    """
    return one(z) / (one(z) + exp(-z))
end
σ (generic function with 1 method)

Zero initialization

Initialize w & b with with Zeros

function initialize(dim)
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    w = zeros(dim, 1)
    b = 2
    
    @assert(size(w) == (dim, 1))
    @assert(isa(b, Float64) || isa(b, Int64))
    
    return w, b
end
initialize (generic function with 1 method)

Forward and Backward propagation

propagate function is the function at the heart of the algorithm. This does the forward prop -> calculate cost -> back-prop.

Forward Propagation:

  • You get X
  • You compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
  • You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

Here are the two formulas you will be using:

$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$ $$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$

Here is something I love about Julia. It's that you can directly use symbols as variables 😍. Doesn't it look awesome?

function propagate(w, b, X, Y)
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation
    """
    m = size(X, 2)
    
    # Forward prop
    Z = w'X .+ b
     = σ.(Z)
    
    @assert(size() == size(Y))
    
    # Compute cost
    𝒥 = -1 * sum(Y .* log.() .+ (1 .- Y) .* log.(1 .- ))
    𝒥 /= m
    
    @assert(size(𝒥) == ())
    
    # Back-prop
    𝜕𝑧 =  - Y
    @assert(size(𝜕𝑧) == size() && size(𝜕𝑧) == size(Y))
        
    𝜕𝑤 = (1/m) * X * 𝜕𝑧'
    𝜕𝑏 = (1/m) * sum(𝜕𝑧)
    
    𝒥, Dict("𝜕𝑤" => 𝜕𝑤, "𝜕𝑏" => 𝜕𝑏)
end
propagate (generic function with 1 method)
function optimize(w, b, X, Y, num_iterations, 𝛼, print_cost)
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b
    """
    
    costs = Array{Float64, 2}(undef, num_iterations, 1)
    
    for i=1:num_iterations
        
        𝒥, 𝛻 = propagate(w, b, X, Y)
        
        𝜕𝑤, 𝜕𝑏 = 𝛻["𝜕𝑤"], 𝛻["𝜕𝑏"] 
        
        global 𝜕𝑤, 𝜕𝑏
        
        w -= 𝛼 .* 𝜕𝑤
        b -= 𝛼 .* 𝜕𝑏
        
        costs[i] = 𝒥
        
        if print_cost && i % 100 == 0
            println("Cost after iteration $i = $𝒥")
        end
    end
    
    params = Dict("w" => w, "b" => b)
    grads = Dict("𝜕𝑤" => 𝜕𝑤, "𝜕𝑏" => 𝜕𝑏)
    
    params, grads, costs
    
end
optimize (generic function with 1 method)
function predict(w, b, X)
    """
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    """
    m = size(X, 2)
    preds = zeros(1, m)
    
     = σ.(w'X .+ b)
    
    preds = [p > 0.5 ? 1 : 0 for p in Iterators.flatten()]
    
    preds = reshape(preds, (1, m))
            
    @assert(size(preds) == (1, m))
    
    preds
end
predict (generic function with 1 method)

Model

Combine all functions to train the model
Learning rate: $\alpha = 0.005$, iterations(epochs): 2000

function model(X_train, y_train, X_test, y_test, num_iterations, 𝛼, print_cost)
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    # Initialize parameters
    w, b = initialize(size(X_train, 1))
    
    # Gradient descent
    params, grads, costs = optimize(w, b, X_train, y_train, num_iterations, 𝛼, print_cost)
    
    w, b = params["w"], params["b"]
    
    preds_test = predict(w, b, X_test)
    preds_train = predict(w, b, X_train)
    
    train_acc = 100 - mean(abs.(preds_train - y_train)) * 100
    test_acc = 100 - mean(abs.(preds_test - y_test)) * 100
    
    @show train_acc
    @show test_acc
    
    d = Dict(
        "costs" => costs, 
        "test_preds" => preds_test, 
        "train_preds" => preds_train,
        "w" => w,
        "b" => b,
        "𝛼" => 𝛼,
        "num_iterations" =>  num_iterations
    )
    
    d;
end
model (generic function with 1 method)
d = model(X_train, y_train, X_test, y_test, 2000, 0.005, true);
Cost after iteration 100 = 0.6059553608803026
Cost after iteration 200 = 0.46599325289797994
Cost after iteration 300 = 0.38370595981990246
Cost after iteration 400 = 0.34315419442154194
Cost after iteration 500 = 0.31158754106420994
Cost after iteration 600 = 0.285876628601981
Cost after iteration 700 = 0.26440388180712293
Cost after iteration 800 = 0.24612175591755672
Cost after iteration 900 = 0.23031699193109054
Cost after iteration 1000 = 0.21648420924669995
Cost after iteration 1100 = 0.20425345186200097
Cost after iteration 1200 = 0.1933464382717651
Cost after iteration 1300 = 0.1835489826274778
Cost after iteration 1400 = 0.17469296430212788
Cost after iteration 1500 = 0.16664416505116358
Cost after iteration 1600 = 0.15929383882436224
Cost after iteration 1700 = 0.15255272889996435
Cost after iteration 1800 = 0.1463467326141669
Cost after iteration 1900 = 0.14061370137325302
Cost after iteration 2000 = 0.1353010391796091
train_acc = 99.04306220095694
test_acc = 70.0

Plotting

x = 1:2000;
y = d["costs"];

gr() # backend

plot(x, y, title = "Learning rate = 0.005", label="negative log-likelihood")
xlabel!("iteration")
ylabel!("cost")

Epilouge

Okay, you see that the model is clearly overfitting the training data 😂🤣. Training accuracy is close to 100%. This is a good sanity check: our model is working and has high enough capacity to fit the training data. Test error is 70%. It is actually not bad for this simple model, given the small dataset we used and that logistic regression is a linear classifier. Further we can reduce overfitting by using regularization etc.

Github repo for notebook: Julia_ML

As always, thank you for reading 😊😃!