Tag: Deep Learning

  • Deep Cross Networks Explained – An Evolution of Feed Forward Networks

    Deep Learning has revolutionized various sectors lately. One critical component of this revolution is the emergence of the Deep Cross Network (DCN). DCN is a novel type of neural network that significantly deviates from the traditional feed-forward networks to offer more robust and efficient solutions. This article aims to provide an in-depth understanding of the Deep Cross Network, its differences from the traditional feed-forward networks, and the areas of its application.

    Understanding Deep Cross Network (DCN)

    The Deep Cross Network (DCN) is a sophisticated hybrid model that combines the strengths of deep neural networks (DNNs) and feature crossing. It was introduced to handle high-dimensional sparse data more efficiently. It’s a mix of deep learning for non-linear input-output mappings and feature crossing for capturing some form of interaction between the feature dimensions.

    The core idea behind DCN is to apply explicit and efficient feature crossing in an input space. This is done by using a cross network that applies a cross operation on the input features to learn explicit bounded-degree feature interactions, which is then combined with a deep network that models arbitrary interactions.

    How DCN Differs From Feed-Forward Networks

    Feed-forward networks or Multilayer Perceptrons (MLPs) are the simplest type of artificial neural network. In these networks, data moves in one direction—from the input layer, through the hidden layers, and finally to the output layer. There is no looping or cycling back of data.

    DCN, on the other hand, doesn’t strictly adhere to this one-way flow. Instead, it allows the explicit feature crossing, which essentially enables the model to learn certain feature interactions and feed them back into the model. This combination of explicit and bounded-degree feature crossing with deep networks gives DCN its unique strength.

    Another notable difference lies in the complexity and efficiency of the two models. While feed-forward networks can become computationally expensive and complex as they increase in size and depth, DCN manages to handle high-dimension sparse input effectively and efficiently, thanks to its unique architecture.

    Another distinguishing feature is the ability of DCN to model feature interactions. While standard feed-forward networks can struggle to capture intricate feature interactions without substantial depth, DCN excels at learning both low- and high-order feature interactions effectively.

    Applications of Deep Cross Network

    The Deep Cross Network has numerous applications across various domains. Some of the most prevalent are:

    1. Recommendation Systems: DCN can effectively handle high-dimensional data and capture complex feature interactions, making it suitable for recommendation systems. It can model the interactions between users and items efficiently to provide accurate recommendations.
    2. Advertisement Click Prediction: DCN’s ability to capture high-order feature interactions makes it a perfect fit for predicting advertisement clicks. By understanding the intricate relationships between user behavior, ad characteristics, and context, it can predict the likelihood of a user clicking on an ad.
    3. Fraud Detection: In banking and finance, DCN can be used for fraud detection by effectively modeling the complex relationships between various transactions.
    4. Natural Language Processing: DCN can also be applied to various NLP tasks, such as sentiment analysis or text classification, where it can learn effective feature interactions from high-dimensional text data.

    Conclusion

    The Deep Cross Network is a significant breakthrough in the field of deep learning. Its unique combination of deep networks and feature crossing distinguishes it from traditional feed-forward networks and makes it a powerful tool for handling high-dimensional sparse data.

    Let me know in the comments if you want to go over an application of Deep Cross Networks using an example dataset.

  • GPT-4 Vision API – How to Guide

    In the Dev Conference, OpenAI announced the GPT-4 Vision API. With access to this one can develop many tools, with the GPT-4-turbo model being the engine of your tool. The use cases can range from information retrieval to classification models.

    In this article, we will go over how you can use the vision API, how can you pass multiple images with the API and some tricks you should be using to improve the response.

    Firstly, you should have a billing account with OpenAI and also some credits to use this API, as, unlike ChatGPT, here you’re charged per token and not a flat fee, so be careful in your experiments.

    The API –

    The API consists of two parts –

    1. Header – Here you pass your authentication key and if you want the organisation id
    2. Payload – This is where the meat of your request lies. The image can be passed either as a URL or a base64 encoded image. I prefer to pass it in the latter way.
    # To encode the image in base64
    
    # Function to encode the image
    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    
    # Path to your image
    image_path = "./sample.png"
    
    # Getting the base64 string
    base64_image = encode_image(image_path)

    Let’s look at the API format

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY_HERE}"
    }
    
    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
             {"role": <user or system>,
              "content" : [{"type": <text or image_url>,
                            "text or image_url": <text or image_url>}]
    }
        ],
        "max_tokens": <max tokens here>
    }
    
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

    Let’s take an example.
    Suppose I want to create a prompt which has a system prompt and a user prompt which can extract JSON output from an image. My payload will look like this.

    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
    # First define the system prompt        
    {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a system that always extracts information from an image in a json_format"
                }
            ]
        },
            
        # Define the user prompt  
          {
            "role": "user",
    # Under the user prompt, I pass two content, one text and one image
            "content": [
              {
                "type": "text",
                "text": """Extract the grades from this image in a structured format. Only return the output.
                           ```
                           [{"subject": "<subject>", "grade": "<grade>"}]
                           ```"""
              },
              {
                "type": "image_url",
                "image_url": {
                  "url": f"data:image/jpeg;base64,{base64_image}"
                }
              }
            ]
          }
        ],
        "max_tokens": 500 # Return no more than 500 completion tokens
    }

    The return I get from the API is exactly how i wanted.

    ```json
    [
      {"subject": "English", "grade": "A+"},
      {"subject": "Math", "grade": "B-"},
      {"subject": "Science", "grade": "B+"},
      {"subject": "History", "grade": "C+"}
    ]
    ```

    This is just an example of how just by using the correct prompt we can built an information retrieval system on images using the vision API.

    In the next article, we will build a classifier using the API, which will involve no knowledge of Machine Learning and just by using the API we will build a state-of-the-art image classifier.

  • Linear Regression with Keras

    One often thinks that you can use deep learning for classification problems like text or image classification, or for similar tasks like segmentation, language models etc. But you can also do simple linear regression with deep learning libraries. I’ve also attached the GitHub Gist in case you want to explore the working notebook.

    In this post I’ll go over the model, it’s explanation on how can you do linear regression with keras.

    In Keras, it can be implemented using the Sequential model and the Dense layer. Here’s an example of how to implement linear regression with Keras:

    First we take a toy regression problem from scikit-learn datasets.

    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import train_test_split
    
    X,y = load_diabetes(return_X_y=True)
    
    X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.8)
    

    Now we will need to define the model using Keras. That is actually very simple, you just have to take one sequential model with a Dense layer. The activation for this layer will be linear as we’re building a linear model and the loss will be mean squared error.

    import numpy as np
    from keras.models import Sequential
    from keras.layers import Dense
    
    # define the model
    model = Sequential()
    model.add(Dense(units=1, activation='linear'))
    # compile the model
    model.compile(optimizer='sgd', loss='mean_squared_error', metrics = ['mae'])
    
    #fit the model
    model.fit(x=X_train, y=y_train, validation_data=(X_test,y_test), 
              epochs=100, batch_size=128)
    

    Thats then all that is left is to call model.predict(X_test).

    You can find the GitHub Gist below.

    Loading
    Sorry, something went wrong. Reload?
    Sorry, we cannot display this file.
    Sorry, this file is invalid so it cannot be displayed.
  • I asked ChatGPT to write a language model

    I asked ChatGPT to write a language model. Here is the code that it returned.

    from keras.layers import Embedding, LSTM, Dense
    from keras.models import Sequential
    
    # Define the model
    model = Sequential()
    model.add(Embedding(input_dim=vocab_size, output_dim=embedding_size, input_length=max_length))
    model.add(LSTM(units=hidden_size))
    model.add(Dense(units=vocab_size, activation='softmax'))
    
    # Compile the model
    model.compile(optimizer='adam', loss='categorical_crossentropy')
    
    # Fit the model to the training data
    model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs)
    
    

    So I decided to build a language model using it, but before I had to write a couple of lines of code myself. First the Tokenizer.

    class Tokenizer():
        def __init__(self, 
                     oov_token ='<unk>',
                     pad_token ='<pad>'):
            self.vocab = {}
            self.reverse_vocab = {}
            self.oov_token = oov_token
            self.pad_token = pad_token
            self.__add_to_dict(self.oov_token)
            self.__add_to_dict(self.pad_token)
            for i in range(10):
                self.__add_to_dict(str(i))
            for i in range(26):
                self.__add_to_dict(chr(ord('a') + i))
    
            # Add space and punctuation to the dictionary
            self.__add_to_dict('.')
            self.__add_to_dict(' ')
        
        def __add_to_dict(self, character):
            if character not in self.vocab:
                self.vocab[character] = len(self.vocab)
                self.reverse_vocab[self.vocab[character]] = character
            
        def tokenize(self, text):
            return [self.vocab[c] for c in text]
    
        def detokenize(self, text):
            return [self.reverse_vocab[c] for c in text]
        
        def get_vocabulary(self):
            return self.vocab
        
        def vocabulary_size(self):
            return len(self.vocab)
        
        def token_to_id(self,character):
            return self.vocab[character]
        
        def id_to_token(self , token):
            return self.reverse_vocab[token]
        
        def pad_seq(self,seq, max_len):
            return seq[:max_len] + [self.token_to_id(self.pad_token)]*(max_len-len(seq))
    

    Then I added the config, created a small corpus of text and prepared the training data it needs to train the model. To prepare the training data I also asked how should the corpus be created and created the corpus as it showed me.

    t = Tokenizer()
    vocab_size = t.vocabulary_size()
    embedding_size = 64
    max_length = vocab_size
    num_epochs = 50
    batch_size = 16
    
    corpus = ["this is a dog",
             "dogs live with humans", 
             "they are called what is known as pets",
             "cats are also pets",
             "there are also wolves in the jungle",
             "there are many animals in the jungle",
             "the lion is called the king of the jungle",
             "the largest animal in the jungle is the elephant",
             "we have horses also there",
             "foxes are quite quick and they are also very clever",
             "the zebra also looks like a horse but is not domesticated",
             "we go to the zoo to look at monkeys",
             "we all live together on this earth",
             'monkeys like bananas',
             'pandas eat bamboo',
             'tigers are dangerous',
             'zebras have stripes',
             'lions are the kings of the savannah',
             "it is upon humans to take care of this planet"]
    tokenized_corpus = [t.tokenize(line) for line in corpus]
    #Padding the tokenized corpus
    x_train = np.array([t.pad_seq(sentence, max_length) for sentence in tokenized_corpus])
    
    y_train = np.array([np.roll(sentence, -1) for sentence in x_train])
    
    
    

    Also it had a mistake in the code, which I corrected, and ran the training loop. You can find the full code here on my Github.

    I then asked ChatGPT to write the inference code for the model. Here is what it returned.

    def generate_text(seed_text, model, max_gen_length, tokenizer):
        # Tokenize the seed text
        seed_text = tokenizer.tokenize(seed_text)
        # Pad the seed text to the maximum length
        seed_text = tokenizer.pad_seq(seed_text, max_length=30)
        # Initialize an empty list to store the generated words
        generated_text = []
        # Append the seed text to the generated text
        generated_text.extend(seed_text)
        # Loop for the specified number of words to generate
        for i in range(max_gen_length):
            # Use the model to predict the next word
            next_word_probs = model.predict(np.array([seed_text]))[0]
            # Get the index of the most probable next word
            next_word_idx = np.argmax(next_word_probs)
            # Append the generated word to the generated text
            generated_text.append(next_word_idx)
            # update the seed text
            seed_text = np.delete(seed_text, 0)
            seed_text = np.append(seed_text, next_word_idx)
        # Convert the generated text from indices to words
        generated_text = [tokenizer.id_to_token(word) for word in generated_text]
        return "".join(generated_text)
    
    # Initialize the seed text
    seed_text = "The sky is"
    # Generate new text
    generated_text = generate_text(seed_text, model, max_gen_length=10, tokenizer=tokenizer)
    print(generated_text)
    
    

    Making a few changes to the code to suit our tokenizer class and model, here are a few inputs and outputs.

    Input - the sky is
    Output - the sky is<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>4444444444
    Input - "lion is the king of the jungle"
    Output - lion is the king of the jungle<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>4444444444

    Sure the output is terrible, but remember it is a very basic model architecture and we’ve not used transformers or temperature sampling to improve our language model. In my future posts, I’ll use ChatGPT to build upon these blocks to train even bigger and more complex language models.

    This shows how ChatGPT or similar large language models can enable developers in writing code or develop models in a short amount of time. It is

  • ReLU vs Leaky ReLU, when to use what

    ReLU (Rectified Linear Unit) and Leaky ReLU are both types of activation functions used in neural networks.

    ReLU

    ReLU is defined as f(x) = max(0, x), where x is the input to the function. It sets all negative input values to zero while allowing all non-negative values to pass through unchanged. This can help speed up training and improve the performance of the model because it reduces the number of negative input values that need to be processed by the model.

    Leaky ReLU is an extension of ReLU that aims to address the problem of “dying ReLUs” in which some neurons in the network never activate because the gradient is zero for all input values less than zero. It can be defined mathematically as f(x) = max(x, kx) where k is usually a small negative slope (of 0.01 or so) for negative input values, rather than being zero as in a standard ReLU.

    In practice, LeakyReLU is being used as a generalization of ReLU. This small negative slope helps in avoiding the dying ReLU problem. Also, it helps to train a network faster as the gradients for negative input values will not be zero. A general rule of thumb when choosing between the two would be that, if the problem does not have sparse inputs and the data set is not too small, using Leaky ReLU may result in a more accurate model. Otherwise, if the problem has sparse inputs and/or the data set is small, then using ReLU is a better choice.

    It also depends on personal preferences and what the dataset is like. Sometimes leaky ReLU may work better in some cases and sometimes ReLU may be better. It’s important to try out different activation functions and see which one gives the best performance on your dataset.

  • Why Tanh is a better activation function than sigmoid ?

    You might be asked that why often in neural networks, tanh is considered to be a better activation function than sigmoid.

    Sigmoid
    Tanh

    Andrew NG also mentions in his deep learning specialization course that tanh is almost always a better activation function than sigmoid. So why is that the case?

    There are a few reasons why the hyperbolic tangent (tanh) function is often considered to be a better activation function than the sigmoid function:

    1. The output of the tanh function is centered around zero, which means that the negative inputs will be mapped to negative values and the positive inputs will be mapped to positive values. This makes the learning process of the network more stable.
    2. The tanh function has a derivative that is well-behaved, meaning that it is relatively easy to compute and the gradient is relatively stable.
    3. The sigmoid function, on the other hand, maps all inputs to values between 0 and 1. This can cause the network to become saturated, and the gradients can become very small, making the network difficult to train.
    4. Another reason is that the range of the tanh function is [-1,1], while the range of the sigmoid function is [0,1], which makes the model output values more similar to the standard normal distribution. It will be less likely to saturate and more likely to have a good gradient flow, leading to faster convergence.
    5. Another advantage is that the tanh function is differentiable at all points. In contrast, the sigmoid function has a kink at 0, which may cause issues when computing gradients during back-propagation, and other optimization methods.

    All that being said, whether to use sigmoid or tanh depends on the specific problem and context, and it’s not always the case that one is clearly better than the other.