<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Kiarash Kiani]]></title><description><![CDATA[I am a highly motivated Data Scientist/Machine Learning Engineer who strives to apply machine learning and data science to real-world production environments.]]></description><link>https://kiani.info</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1649687534139/we1ll-zZ6.png</url><title>Kiarash Kiani</title><link>https://kiani.info</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 23:30:31 GMT</lastBuildDate><atom:link href="https://kiani.info/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Introducing WordMentor: Learn Vocabularies from Popular Authors]]></title><description><![CDATA[Over the past few weeks, my team and I at DataChet worked tirelessly to craft and deliver an exciting new product: WordMentor.ai. As a language enthusiast and a proud member of the DataChef team, I am delighted to introduce you to Wordmentor—a powerf...]]></description><link>https://kiani.info/introducing-wordmentor-learn-vocabularies-from-popular-authors</link><guid isPermaLink="true">https://kiani.info/introducing-wordmentor-learn-vocabularies-from-popular-authors</guid><category><![CDATA[Applications]]></category><category><![CDATA[wordmentor]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Tue, 23 May 2023 12:36:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1684845274195/8841a1d3-b257-430b-aa36-e3c4d4a01558.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the past few weeks, my team and I at <a target="_blank" href="https://datachef.co">DataChet</a> worked tirelessly to craft and deliver an exciting new product: <a target="_blank" href="http://wordmentor.ai">WordMentor.ai</a>. As a language enthusiast and a proud member of the DataChef team, I am delighted to introduce you to Wordmentor—a powerful platform to discover words from the literary genius of renowned authors with a cutting-edge AI-driven method.</p>
<p>With WordMentor, we set out to create an immersive learning experience that goes beyond traditional vocabulary drills. Our team of language enthusiasts and AI experts meticulously curated a collection of captivating words from popular authors.</p>
<p>Today, WordMentor is finally live on Product Hunt and we can’t wait for you to join us on this incredible journey. Visit our product page, share your thoughts, and be part of the discussion. Let’s elevate the way we learn words and inspire each other!  </p>
<p>🔗 <strong>WordMentor on ProductHunt</strong>: <a target="_blank" href="https://www.producthunt.com/posts/wordmentor-3">https://www.producthunt.com/posts/wordmentor-3</a></p>
]]></content:encoded></item><item><title><![CDATA[Customizing VSCode for CDK: A Guide to Creating Custom Tasks]]></title><description><![CDATA[VSCode is one of the most popular IDEs among developers, and for a good reason. With its out-of-the-box functionality and extendability, it offers a versatile and customizable experience.
Tasks are a great feature that allows you to use bash commands...]]></description><link>https://kiani.info/customizing-vscode-for-cdk-a-guide-to-creating-custom-tasks</link><guid isPermaLink="true">https://kiani.info/customizing-vscode-for-cdk-a-guide-to-creating-custom-tasks</guid><category><![CDATA[vscode]]></category><category><![CDATA[VSCode Tips]]></category><category><![CDATA[aws-cdk]]></category><category><![CDATA[CDK]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Thu, 12 Jan 2023 17:20:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/-FHIdRVGets/upload/e4796dc166c11d06caf42a58865b298e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>VSCode is one of the most popular IDEs among developers, and for a good reason. With its out-of-the-box functionality and extendability, it offers a versatile and customizable experience.</p>
<p>Tasks are a great feature that allows you to use bash commands to create custom VSCode build commands. For example, you can create easy deploy, destroy, and synth commands for CDK.</p>
<p>To get started, you need to create the <code>.vscode</code> folder at the root of your project and add <code>tasks.json</code> in to that folder. In <code>tasks.json</code>, you can define a series of commands that VSCode needs to run in order to destroy, deploy, or synth your CDK code.</p>
<p>The code below shows the high-level structure of the <code>task.json</code>:</p>
<pre><code class="lang-json">{ 
    <span class="hljs-comment">// See https://go.microsoft.com/fwlink/?LinkId=733558</span>
    <span class="hljs-comment">// for the documentation about the tasks.json format</span>
    <span class="hljs-attr">"version"</span>: <span class="hljs-string">"2.0.0"</span>,
    <span class="hljs-attr">"tasks"</span>: [],
    <span class="hljs-attr">"inputs"</span>: []
  }
</code></pre>
<p>In the <code>tasks</code> property, you can define as many commands as you wish to have. For example, the synth command is the easiest to create:</p>
<pre><code class="lang-json"> {
    <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Synth CDK"</span>,
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"shell"</span>,
    <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx aws-cdk synth"</span>,
    <span class="hljs-attr">"group"</span>: <span class="hljs-string">"build"</span>,
    <span class="hljs-attr">"presentation"</span>: {
        <span class="hljs-attr">"reveal"</span>: <span class="hljs-string">"always"</span>,
        <span class="hljs-attr">"panel"</span>: <span class="hljs-string">"shared"</span>,
        <span class="hljs-attr">"clear"</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-attr">"focus"</span>: <span class="hljs-literal">true</span>
    }
},
</code></pre>
<p>Before diving into creating a task for the deploy command, it's worth noting that I use <a target="_blank" href="https://github.com/99designs/aws-vault">aws-vault</a> to manage my credentials and access AWS via the aws-cli. If you haven't installed it, I highly recommend doing so. You can use brew to install aws-vault:</p>
<pre><code class="lang-bash">brew install --cask aws-vault
</code></pre>
<p>Now, let's get back to creating the configuration for the CDK deploy command. As we have multiple profiles and accounts set up in <code>aws-vault</code>, VSCode must know which account we wish to deploy our code to. To achieve this, I will leave a placeholder in the configuration for the profile name:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Deploy CDK"</span>,
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"shell"</span>,
    <span class="hljs-attr">"command"</span>: <span class="hljs-string">"aws-vault exec --prompt=osascript ${input:accounts} -- npx aws-cdk deploy"</span>,
    <span class="hljs-attr">"group"</span>: <span class="hljs-string">"build"</span>,
    <span class="hljs-attr">"presentation"</span>: {
        <span class="hljs-attr">"reveal"</span>: <span class="hljs-string">"always"</span>,
        <span class="hljs-attr">"panel"</span>: <span class="hljs-string">"shared"</span>,
        <span class="hljs-attr">"clear"</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-attr">"focus"</span>: <span class="hljs-literal">true</span>
    }
}
</code></pre>
<p>The <code>${input:accounts}</code> is the placeholder. But, we don't want to type the profile's name. Instead, we want VSCode to show us the list of the available profiles so we can choose from them. To achieve this, we can define an input in the input section of the tasks.json to list those profiles for us.</p>
<p>However, VSCode's inputs do not run bash scripts. To fix this, you need to install <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=augustocdias.tasks-shell-input">Tasks Shell Input</a> from the marketplace.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"id"</span>: <span class="hljs-string">"accounts"</span>,
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"command"</span>,
    <span class="hljs-attr">"command"</span>: <span class="hljs-string">"shellCommand.execute"</span>,
    <span class="hljs-attr">"args"</span>: {
        <span class="hljs-attr">"command"</span>: <span class="hljs-string">"aws configure list-profiles"</span>
    }
}
</code></pre>
<p>We can also use the same approach for the CDK destroy command as well. The whole file should looks like this:</p>
<pre><code class="lang-json">{
    <span class="hljs-comment">// See https://go.microsoft.com/fwlink/?LinkId=733558</span>
    <span class="hljs-comment">// for the documentation about the tasks.json format</span>
    <span class="hljs-attr">"version"</span>: <span class="hljs-string">"2.0.0"</span>,
    <span class="hljs-attr">"tasks"</span>: [
        {
            <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Deploy CDK"</span>,
            <span class="hljs-attr">"type"</span>: <span class="hljs-string">"shell"</span>,
            <span class="hljs-attr">"command"</span>: <span class="hljs-string">"aws-vault exec --prompt=osascript ${input:accounts} -- npx aws-cdk deploy"</span>,
            <span class="hljs-attr">"group"</span>: <span class="hljs-string">"build"</span>,
            <span class="hljs-attr">"presentation"</span>: {
                <span class="hljs-attr">"reveal"</span>: <span class="hljs-string">"always"</span>,
                <span class="hljs-attr">"panel"</span>: <span class="hljs-string">"shared"</span>,
                <span class="hljs-attr">"clear"</span>: <span class="hljs-literal">true</span>,
                <span class="hljs-attr">"focus"</span>: <span class="hljs-literal">true</span>
            }
        },
        {
            <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Destroy CDK"</span>,
            <span class="hljs-attr">"type"</span>: <span class="hljs-string">"shell"</span>,
            <span class="hljs-attr">"command"</span>: <span class="hljs-string">"aws-vault exec --prompt=osascript ${input:accounts} -- npx aws-cdk destroy"</span>,
            <span class="hljs-attr">"group"</span>: <span class="hljs-string">"build"</span>,
            <span class="hljs-attr">"presentation"</span>: {
                <span class="hljs-attr">"reveal"</span>: <span class="hljs-string">"always"</span>,
                <span class="hljs-attr">"panel"</span>: <span class="hljs-string">"shared"</span>,
                <span class="hljs-attr">"clear"</span>: <span class="hljs-literal">true</span>,
                <span class="hljs-attr">"focus"</span>: <span class="hljs-literal">true</span>
            }
        },
        {
            <span class="hljs-attr">"label"</span>: <span class="hljs-string">"Synth CDK"</span>,
            <span class="hljs-attr">"type"</span>: <span class="hljs-string">"shell"</span>,
            <span class="hljs-attr">"command"</span>: <span class="hljs-string">"npx aws-cdk synth"</span>,
            <span class="hljs-attr">"group"</span>: <span class="hljs-string">"build"</span>,
            <span class="hljs-attr">"presentation"</span>: {
                <span class="hljs-attr">"reveal"</span>: <span class="hljs-string">"always"</span>,
                <span class="hljs-attr">"panel"</span>: <span class="hljs-string">"shared"</span>,
                <span class="hljs-attr">"clear"</span>: <span class="hljs-literal">true</span>,
                <span class="hljs-attr">"focus"</span>: <span class="hljs-literal">true</span>
            }
        },
    ],
    <span class="hljs-attr">"inputs"</span>: [
        {
            <span class="hljs-attr">"id"</span>: <span class="hljs-string">"accounts"</span>,
            <span class="hljs-attr">"type"</span>: <span class="hljs-string">"command"</span>,
            <span class="hljs-attr">"command"</span>: <span class="hljs-string">"shellCommand.execute"</span>,
            <span class="hljs-attr">"args"</span>: {
                <span class="hljs-attr">"command"</span>: <span class="hljs-string">"aws configure list-profiles"</span>
            }
        }
    ]
}
</code></pre>
]]></content:encoded></item><item><title><![CDATA[How to fully uninstall apps in macOS just using bash]]></title><description><![CDATA[Introduction
When you install a new application on your mac, the setup will throw a lot of files through your system directories. Even after the installation, many apps will continue to create config and helper files in different directories. 
Since ...]]></description><link>https://kiani.info/how-to-fully-uninstall-apps-in-macos-just-using-bash</link><guid isPermaLink="true">https://kiani.info/how-to-fully-uninstall-apps-in-macos-just-using-bash</guid><category><![CDATA[macOS]]></category><category><![CDATA[Bash]]></category><category><![CDATA[tips]]></category><category><![CDATA[automation]]></category><category><![CDATA[Applications]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Fri, 06 May 2022 11:39:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/_8gR561QtEA/upload/v1651837140168/1_Hm2RUEf.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>When you install a new application on your mac, the setup will throw a lot of files through your system directories. Even after the installation, many apps will continue to create config and helper files in different directories. </p>
<p>Since MacAppStore came, lots of applications are now available on the MacAppStore, and uninstalling them is just a push of a button, but still, there are tons of applications still out there that are not available on the MacAppStore for different reasons—like requiring access that is not permitted by the MacAppStore guidelines. </p>
<p>Many application like CleanMyMac has come to solve the issue of uninstalling outside MacAppStore applications and taking care of completely cleaning their configs, helper, logger and etc files. Those applications are expensive for such a simple task and also not all of them are that good at finding extra files. In this blog post, I am going to show you how I used a single-line bash script to fully uninstall any application that I don’t like to have anymore.</p>
<h1 id="heading-understanding-macos-directories">Understanding macOS directories</h1>
<p>macOS uses specific directories to store user, system, and application configurations. It is crucial to understand these libraries as it will help us to differentiate the application files from the other files.</p>
<p>macOS uses <code>plist</code> files as configuration files. So, it is the most probable file type you will see any application creates. Also, apple likes to store these configurations under two directories. The first directory that is holds applications and system-related stuff is <code>/Library</code> this folder holds files that are owned by the root and are available for all the users. In contrast, the <code>~/Library</code> holds user-level files.</p>
<h1 id="heading-the-bash-commands">The bash commands</h1>
<p>To begin with, we need to understand three commands in bash first, and then by the power of bash pipe we will put them together to create our own easy uninstaller.</p>
<h2 id="heading-1-the-mdfind-command">1. The <code>mdfind</code> command</h2>
<p>The first command to understand it is <code>mdfind</code>. This command is responsible for searching for files through the whole system. By passing the option command <code>-name</code> we can look for a regex pattern or an exact word in files names and paths. </p>
<p>Here is an example of running the <code>mdfind</code> command to find all files that have “iterm2” in their name:</p>
<pre><code class="lang-bash">mdfind -name iterm2
</code></pre>
<p>Here is an output example:</p>
<pre><code class="lang-text">/Users/kiarash/Library/Application Support/iTerm2/iterm2-daemon-1.socket.lock
/Users/kiarash/Library/HTTPStorages/com.googlecode.iterm2
/Users/kiarash/Library/Application Support/iTerm2
/usr/local/Caskroom/iterm2
/usr/local/Cellar/ncurses/6.3/share/terminfo/69/iterm2-direct
/usr/local/Cellar/ncurses/6.3/share/terminfo/69/iTerm2.app
/usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask/Casks/iterm2.rb
</code></pre>
<h2 id="heading-2-the-vipe-command">2. The <code>vipe</code> command</h2>
<p>As I mentioned before many uninstalling applications aren’t able to fully clean the extra files because they only look for specific files in specific directories that I mentioned earlier. What I like to do instead is to use <code>mdfind</code> to search for any related files and then use an editor like <code>neovim</code> to list all the candidate files and then edit the list of the candidates and return it back to the bash script to be removed.</p>
<p><strong>💡Note 1:</strong> If you haven't used <code>neovim</code> but <code>vim</code> instead, don't worry! <code>neovim</code> is just the community version for <code>vim</code>.</p>
<p><strong>💡Note 2:</strong> If you haven't used either <code>neovim</code> or <code>vim</code> <a target="_blank" href="https://www.youtube.com/watch?v=RZ4p-saaQkc">check out this tutorial</a> or use <code>nano</code> instead.</p>
<p>To achieve that goal I used <code>vipe</code>. This command basically takes the output of the first command, passes it to the default editor–which here is <code>neovim</code>–and then after saving and quitting from the default editor, it will pass the edited and saved list to the second command.</p>
<p>Before using the <code>vipe</code> command, we need to install it via <code>brew</code> as it is not installed by default in macOS. To install <code>vipe</code> we need to install a formula named <code>moreutils</code> which contains <code>vipe</code>.</p>
<pre><code class="lang-bash">brew install moreutils
</code></pre>
<p>Now you can use the pattern below to use the pass, edit, and pass process between two commands:</p>
<pre><code class="lang-bash">command1 | vipe | command2
</code></pre>
<h2 id="heading-3-the-xargs-command">3. The <code>xargs</code> command</h2>
<p>Let’s say we successfully listed all the related files to the application which we wish to be removed completely. It is obvious now that we need <code>rm</code> command to delete them, but each file is on a separate line. Also, the <code>rm</code> does not accept the pipe output as argument. To address this issue we will use the <code>xargs</code> command. This command can take a pipe output and run a bash command with that. Plus, the <code>-L</code> option will split the lines and loop through them i.e. it will <code>rm</code> command separately with each line. The script below demonstrates how we can remove a list of files from a text file as a source:</p>
<pre><code class="lang-bash">cat source.txt | xargs -L 1 -I {} rm -r {}
</code></pre>
<h2 id="heading-4-put-it-all-together">4. Put it all together</h2>
<p>Now that we learn about each command, it's time to combine them together and make a script out of them. Save the script below in the <code>unapp.sh</code> file.</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/zsh</span>
mdfind -name <span class="hljs-variable">$1</span> | vipe | xargs -L 1 -I {} rm -rf {}
</code></pre>
<p>Make the script executable by:</p>
<pre><code class="lang-bash">sudo chmod +x unapp.sh
</code></pre>
<p>No we can uninstall any app fully using the command below:</p>
<pre><code class="lang-bash">./unapp.sh iTerm2
</code></pre>
<h1 id="heading-references">References</h1>
<ul>
<li><a target="_blank" href="https://joeyh.name/code/moreutils/"><code>moreutils</code> documentations</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Xargs">xargs</a></li>
<li><a target="_blank" href="https://neovim.io">NeoVim home page</a></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[All Important AWS SageMaker's Paths]]></title><description><![CDATA[Introduction
SageMaker is one of the central services for data scientists. SageMaker took much attention over the last few years. While many data scientists are trying to learn more about it, developers of SageMaker are constantly working on new solu...]]></description><link>https://kiani.info/all-important-aws-sagemakers-paths</link><guid isPermaLink="true">https://kiani.info/all-important-aws-sagemakers-paths</guid><category><![CDATA[AWS]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Docker]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Wed, 27 Apr 2022 06:33:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/xcrI6CPkkJs/upload/v1647758759968/EWj_V0V01.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>SageMaker is one of the central services for data scientists. SageMaker took much attention over the last few years. While many data scientists are trying to learn more about it, developers of SageMaker are constantly working on new solutions and abilities to bring it to the SageMaker. SageMaker provides tools for both the production and experimentations environments.</p>
<p>When it comes to production, SageMaker benefits from the power of dockerization to create isolated scalable computation machines and workflow. If you are not familiar with these concepts, you can learn them from AWS Documentation for <a target="_blank" href="https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html">training</a> and <a target="_blank" href="https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html">processings data</a> jobs.</p>
<p>Passing data to these containers is one of the challenging steps of productionizing an ML application. In this blog post, I am going to break down into detail all the paths and possible ways to send or receive data to a training or processing job container.</p>
<h1 id="heading-methods-of-loading-and-saving-datasets-in-sagemaker">Methods of loading and saving datasets in SageMaker</h1>
<p>S3 is one of the most popular AWS services that is being used for storing data and integrated with many other AWS services. SageMaker is not an exception and has implemented native solutions to load and save data into S3.</p>
<p>Normally, you might run two types of algorithms on SageMaker: a processing job and a training job. You can use <a target="_blank" href="https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingInput">this documentation</a> to see how to load data from S3 to a SageMaker processing job, and <a target="_blank" href="https://sagemaker.readthedocs.io/en/stable/api/utility/inputs.html?highlight=sagemaker.inputs.TrainingInput#sagemaker.inputs.TrainingInput">this documentation</a> for a training job. The code below demonstrates how to define an input source for a processing job:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sagemaker.processing <span class="hljs-keyword">import</span> ProcessingInput

ProcessingInput(
     source=<span class="hljs-string">"s3://sagemaker-sample-data/processing/census/census-income.csv"</span>, 
     destination=<span class="hljs-string">"/opt/ml/processing/input"</span>
)
</code></pre>
<p>There are two options for loading data from S3 in SageMaker, File mode, and Pipe mode. File mode will copy the input files from an S3 bucket into the container. This is the default method and is the popular method of communicating with S3. In the code above, Since we did not pass the <code>S3InputMode</code>, the file mode is considered as the method of loading data. Also, there is a pipe mode that streams the data instead of copying it and, as a result, will not use storage inside the container.</p>
<p>There is one more well-known method of communication with S3 which we will cover later in this post but first, let us talk about the destination paths of those inputs inside the containers.</p>
<h1 id="heading-understanding-containers-paths">Understanding Container's paths</h1>
<p>When we are in SageMaker training or processing container, we are working with directories under the path <code>/opt/ml</code>. The SageMaker itself also uses this directory as the root directory to save and load all files related to the training/processing algorithm. For instance, hyperparameters are stored as a JSON dictionary in <code>/opt/ml/input/config/hyperparameters.json</code>.</p>
<p>There are three states you wish to transfer data from S3 to a container or vice versa. Based on that, we can classify the paths in a container:</p>
<ol>
<li><strong>Input Data:</strong> The very raw data you wish to feed to your preprocessing step must be saved to <code>/opt/ml/processing/input/</code>.</li>
<li><strong>Processed Data:</strong> The output of your preprocessing step must be saved under the <code>processing</code> directory under different channels of your choice. For instance, <code>/opt/ml/processing/training</code>, <code>/opt/ml/processing/test</code>, and <code>/opt/ml/processing/validation</code>.</li>
<li><strong>Trained Model:</strong> The artifacts of your training model from the training step must be saved to <code>/opt/ml/model</code>.</li>
<li><strong>Training Output:</strong> The extra data when training a model must be saved to <code>/opt/ml/output/</code> and <code>/opt/ml/output/data</code>.</li>
</ol>
<h1 id="heading-using-s3-sdk-boto3">Using S3 SDK, Boto3</h1>
<p>Boto3 is the most available solution to download or upload data to AWS S3. It is easily accessible inside the containers. I usually won't go with this option as it is not completely SageMaker friendly, and there is also a better native way to load data from S3 into the container. However, there are some cases in which I prefer to use Boto3 alongside the SageMaker native options to load data, like loading a big configuration. Nevertheless, the code below demonstrates how to download or upload data from S3 using Boto3</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boto3

s3_client = boto3.client(<span class="hljs-string">'s3'</span>)

<span class="hljs-comment"># Upload the file to S3</span>
s3_client.upload_file(<span class="hljs-string">'hello.txt'</span>, <span class="hljs-string">'MyBucket'</span>, <span class="hljs-string">'hello-remote.txt'</span>)

<span class="hljs-comment"># Download the file from S3</span>
s3_client.download_file(<span class="hljs-string">'MyBucket'</span>, <span class="hljs-string">'hello-remote.txt'</span>, <span class="hljs-string">'hello2.txt'</span>)
</code></pre>
<h1 id="heading-references">References</h1>
<ul>
<li><a target="_blank" href="https://sagemaker.readthedocs.io">SageMaker Python SDK</a></li>
<li><a target="_blank" href="https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html">SageMaker Documentations</a></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Mathematicians' relations and communities with each other]]></title><description><![CDATA[Introduction
Graphs have taken a lot of attention during the last years, from graph machine learning methods, including Graph Neural Networks, to Graph Databases. Even on Medium, people are posting graph articles more than before. One thing that I be...]]></description><link>https://kiani.info/mathematicians-relations-and-communities-with-each-other</link><guid isPermaLink="true">https://kiani.info/mathematicians-relations-and-communities-with-each-other</guid><category><![CDATA[Data Science]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[#data visualisation]]></category><category><![CDATA[data]]></category><category><![CDATA[data analysis]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Mon, 11 Apr 2022 11:00:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/MOO6k3RaiwE/upload/v1649415158422/5Zoi1pGCN.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Graphs have taken a lot of attention during the last years, from graph machine learning methods, including Graph Neural Networks, to Graph Databases. Even on Medium, people are posting graph articles more than before. One thing that I believe most data scientists underestimates is the power of graphs in visualization and storytelling.</p>
<p>Many see graphs as a complex and expensive solution for modeling data science problems. However, they still provide a great way to visualize data that no other chart or visualization can.</p>
<p>In this blog post, I will use the Mathematicians of Wikipedia dataset to investigate relations between mathematicians and their advisors using NetworkX to analyze and visualize.</p>
<h1 id="heading-what-is-networkx">What is NetworkX?</h1>
<p>NetworkX is one of the most popular frameworks to work with. I think it is the scikit-learn of graph world! It is most probable to find NetworkX sooner than any other graph library when you start to learn about graphs and graph frameworks.</p>
<p>NetworkX provides many graph implementations, algorithms, and methods of analysis. Even though it is slow for many real applications, it still offers great functionalities worth learning.</p>
<h1 id="heading-understaning-the-data">Understaning the data</h1>
<p>I have found the dataset from <a target="_blank" href="https://www.kaggle.com/datasets/joephilleo/mathematicians-on-wikipedia">Kaggle</a>. The table below demonstrates all the available features for this dataset that describes mathematicians. Note that the examples do not come from the same row.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Column</td><td>Example</td></tr>
</thead>
<tbody>
<tr>
<td><mark>mathematicians</mark></td><td>Johannes Hudde</td></tr>
<tr>
<td>occupation</td><td>['civil engineer', 'engineer']</td></tr>
<tr>
<td>country of citizenship</td><td>['United States of America']</td></tr>
<tr>
<td>place of birth</td><td>New York City</td></tr>
<tr>
<td>date of death</td><td>1844</td></tr>
<tr>
<td>educated at</td><td>['Harvard University']</td></tr>
<tr>
<td>employer</td><td>['University of California, Berkeley']</td></tr>
<tr>
<td>place of death</td><td>['Paris']</td></tr>
<tr>
<td>member of</td><td>['American Mathematical Society']</td></tr>
<tr>
<td><mark>doctoral advisor</mark></td><td>['David Hilbert']</td></tr>
<tr>
<td>languages spoken, written or signed</td><td>['English']</td></tr>
<tr>
<td>academic degree</td><td>['Doctor of Sciences in Physics and Mathematics']</td></tr>
<tr>
<td>doctoral student</td><td>['Michael D. Morley']</td></tr>
<tr>
<td>manner of death</td><td>['natural causes']</td></tr>
<tr>
<td>position held</td><td>['member of the French National Assembly']</td></tr>
<tr>
<td>field of work</td><td>['number theory']</td></tr>
<tr>
<td>award received</td><td>['Fellow of the Royal Society']</td></tr>
<tr>
<td>Erdős number</td><td>['2±0']</td></tr>
<tr>
<td>instance of</td><td>['human']</td></tr>
<tr>
<td>sex or gender</td><td>['male']</td></tr>
<tr>
<td>approx. date of birth</td><td>False</td></tr>
<tr>
<td>day of birth</td><td>18</td></tr>
<tr>
<td>month of birth</td><td>January</td></tr>
<tr>
<td>year of birth</td><td>1711</td></tr>
<tr>
<td>approx. date of death</td><td>False</td></tr>
<tr>
<td>day of death</td><td>13</td></tr>
<tr>
<td>month of death</td><td>March</td></tr>
<tr>
<td>year of death</td><td>1787</td></tr>
</tbody>
</table>
</div><p>The highlighted column names (<code>mathematicians</code> and <code>doctoral advisor</code>) are the columns in which I am interested in analyzing and showing how mathematicians are connected. So then, my first step is to load the dataset and clean it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> networkx <span class="hljs-keyword">as</span> nx
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

df = pd.read_csv(<span class="hljs-string">'../data/data_cleaned.csv'</span>)
</code></pre>
<p>The first problem with the data is that many of its columns, including <code>doctoral advisor</code>, are a list of names embedded as a string. As I am only interested in the <code>doctoral advisor</code> column, I will only fix this column, but you can add the column names to the <code>list_type_columns</code> to fix it as well.</p>
<pre><code class="lang-python">list_type_columns = [
    <span class="hljs-string">'doctoral advisor'</span>
]

df[list_type_columns] = df[list_type_columns].fillna(<span class="hljs-string">'[]'</span>)

<span class="hljs-keyword">for</span> column <span class="hljs-keyword">in</span> list_type_columns:
    df[column] = df[column].str.replace(<span class="hljs-string">"'"</span>, <span class="hljs-string">''</span>, regex=<span class="hljs-literal">False</span>)
    df[column] = df[column].str.replace(<span class="hljs-string">"["</span>, <span class="hljs-string">''</span>, regex=<span class="hljs-literal">False</span>)
    df[column] = df[column].str.replace(<span class="hljs-string">"]"</span>, <span class="hljs-string">''</span>, regex=<span class="hljs-literal">False</span>)
    df[column] = df[column].str.split(<span class="hljs-string">','</span>)

    df = df.explode(column)
</code></pre>
<p>Now that I have handled my dataset issues, including Null values, by replacing them with empty lists and fixing the string embedded list to real lists, I can save it to a CSV file to use later, so I won't need to process my dataset every time.</p>
<pre><code class="lang-python">df.dropna(inplace=<span class="hljs-literal">True</span>)
df[[<span class="hljs-string">'mathematicians'</span>, <span class="hljs-string">'doctoral advisor'</span>]].to_csv(<span class="hljs-string">'../data/adv.csv'</span>, index=<span class="hljs-literal">False</span>)
</code></pre>
<p>I only saved the columns I needed, including <code>mathematicians</code> and <code>doctoral advisor</code>.</p>
<h1 id="heading-pagerank-algorithm">PageRank Algorithm</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1649672053384/4gSDsd0gO.png" alt="PageRank from Wikipedia" /></p>
<p>PageRank computes the rank of each node based on the number of incoming edges. Google initially developed PageRank to calculate the importance of web pages.</p>
<blockquote>
<p>PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.</p>
</blockquote>
<p>Using RankPage in NetworkX is really easy:</p>
<pre><code class="lang-python">G = nx.DiGraph(nx.path_graph(<span class="hljs-number">4</span>))
pr = nx.pagerank(G, alpha=<span class="hljs-number">0.9</span>)
</code></pre>
<h1 id="heading-graph-visualization-has-more-insights">Graph Visualization has more insights</h1>
<p>Now that we have loaded our dataset, cleaned it, and fixed its issues, let's see the top mathematicians with the most students.</p>
<pre><code class="lang-python">top_ten_advisor = df[<span class="hljs-string">'doctoral advisor'</span>].value_counts().sort_values(ascending=<span class="hljs-literal">False</span>).head(<span class="hljs-number">15</span>)
fig, ax = plt.subplots()
ax.barh(top_ten_advisor.index, top_ten_advisor.values)
ax.set_xlabel(<span class="hljs-string">'Number of Students'</span>)
ax.axvline(x=<span class="hljs-number">13</span>, c=<span class="hljs-string">'red'</span>)
plt.show()
</code></pre>
<p>output:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1649672631955/q3htDbNM-.png" alt="advisor-student-bar-chart.png" /></p>
<p>Even though the above image shows the most famous mathematicians and their students, it does not describe how they are connected and leaves lots of questions unanswered:</p>
<ol>
<li><p>Are these top advisors connected by a shared student?</p>
</li>
<li><p>Are these top advisors also advisors of each other?</p>
</li>
<li><p>Do students also advise other students?</p>
</li>
</ol>
<p>To address the questions above, a graph representation can help us. First, let's drop all the advisors with less than 13 students.</p>
<pre><code class="lang-python">minimum_count = <span class="hljs-number">13</span> 
df = df[df[<span class="hljs-string">'doctoral advisor'</span>].isin(df[<span class="hljs-string">'doctoral advisor'</span>].value_counts()[df[<span class="hljs-string">'doctoral advisor'</span>].value_counts() &gt;= minimum_count].index)]
</code></pre>
<p>We create our graph in <code>NetworkX</code> and calculate the PageRanks ratings.</p>
<pre><code class="lang-python">graph = nx.DiGraph()
graph.add_nodes_from(np.unique(df.values.flatten()))
graph.add_edges_from(df.values)

pr = nx.pagerank(graph)

names, ranks = zip(*pr.items())
pr_df = pd.DataFrame(data={<span class="hljs-string">'mathematicians'</span>: names, <span class="hljs-string">'rank'</span>: ranks})
pr_df
</code></pre>
<p>Now that we have both graphs and ranks, we can draw.</p>
<pre><code class="lang-python">fig = plt.figure(<span class="hljs-number">1</span>, figsize=(<span class="hljs-number">30</span>, <span class="hljs-number">20</span>), dpi=<span class="hljs-number">100</span>)

pos = nx.spring_layout(graph, k=<span class="hljs-number">1.1</span>*<span class="hljs-number">1</span>/np.sqrt(len(graph.nodes())), iterations=<span class="hljs-number">20</span>)
nx.draw(graph, node_size=pr_df[<span class="hljs-string">'rank'</span>].values*<span class="hljs-number">10000</span>, with_labels=<span class="hljs-literal">True</span>, pos=pos, edge_color=<span class="hljs-string">'gray'</span>)
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1649674332152/xWqXeQ5IL.png" alt="top-advisor-student-graph2.png" /></p>
<h1 id="heading-references">References</h1>
<ul>
<li><p><a target="_blank" href="https://www.kaggle.com/datasets/joephilleo/mathematicians-on-wikipedia">Mathematicians of Wikipedia</a></p>
</li>
<li><p><a target="_blank" href="https://networkx.org/documentation/stable/reference/introduction.html#">NetworkX Documentations</a></p>
</li>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/PageRank#cite_note-1">PageRank</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Bash script automation in macOS]]></title><description><![CDATA[Introduction
There are many scenarios in which bash scripts can be used to automate processes. Sometimes, I like to test things in Python. For instance, sometimes one of my colleagues asks me if something is possible in Python or how it is done. In t...]]></description><link>https://kiani.info/bash-script-automation-in-macos</link><guid isPermaLink="true">https://kiani.info/bash-script-automation-in-macos</guid><category><![CDATA[macOS]]></category><category><![CDATA[automation]]></category><category><![CDATA[Bash]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Wed, 09 Feb 2022 10:01:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/nfOJXUFfczQ/upload/v1644394458391/GLbxBLjfS.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>There are many scenarios in which bash scripts can be used to automate processes. Sometimes, I like to test things in Python. For instance, sometimes one of my colleagues asks me if something is possible in Python or how it is done. In those cases, I don't want to create a new project, but rather to create a new folder, throw a bunch of python files in, and figure out things. After some time, these test projects make a big mess in my directories, which I don't like.</p>
<p>To fix this, I made a bash script to remove all the files and folders that I have created under "temp" every time I boot my Mac. This blog post will describe how I managed to use macOS <code>launchd</code> to schedule a bash script to delete all the test files and folders.</p>
<h1 id="heading-what-is-launchd">What is <code>launchd</code>?</h1>
<p>As a Linux user or at least someone who has worked with Linux, you may be familiar with <code>systemd</code>. <code>launchd</code> is exactly the same, but it can do more like a job scheduler (corn replacement) and it is much more reliable than <code>systemd</code>.</p>
<h2 id="heading-1-how-to-interface-launchd">1. How to interface <code>launchd</code>?</h2>
<p>Anyhow, to interact with Launchd, use <code>launchctl</code> in your bash shell. Same as the <code>systemd</code> and the <code>systemctl</code> commands, isn't it? You can see more information about <code>launchctl</code> by typing <code>man launchctl</code> in your terminal window.</p>
<h2 id="heading-2-what-launchd-configuration-files-look-like">2. What <code>launchd</code> configuration files look like?</h2>
<p>Having learned about the concept of <code>launchd</code> and its interface <code>launchctl</code>, it is now time to understand how we can define a new configuration for this. In contrast to <code>systemd</code>, which uses TOML config files, <code>launchd</code> makes use of <code>plist</code> files to define a configuration. Below is an example:</p>
<pre><code class="lang-plist"><span class="hljs-meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">plist</span> <span class="hljs-meta-keyword">PUBLIC</span> <span class="hljs-meta-string">"-//Apple//DTD PLIST 1.0//EN"</span> <span class="hljs-meta-string">"http://www.apple.com/DTDs/PropertyList-1.0.dtd"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">plist</span> <span class="hljs-attr">version</span>=<span class="hljs-string">"1.0"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">dict</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>Label<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>com.example.hello<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>ProgramArguments<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">array</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>hello<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>world<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">array</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>KeepAlive<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">true</span>/&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dict</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">plist</span>&gt;</span>
</code></pre>
<h2 id="heading-3-where-to-save-launchd-configuration-files">3. Where to save <code>launchd</code> configuration files?</h2>
<p>There are three locations in which you can store a <code>launchd</code> configuration file. If your application is supposed to run as a daemon service, you can store it under <code>/System/Library/LaunchDaemons/</code>. If your application is supposed to run whenever a user logs in, you should store it under <code>/Library/LaunchAgents</code>. Finally, If your application is supposed to run whenever a specific user logged in it must be stored under <code>~/Library/LaunchAgents</code>.</p>
<h1 id="heading-creating-the-automation">Creating the automation</h1>
<p>It's time to set up the automation now that we know how <code>launchd</code> works. Let's create the bash script file first:</p>
<pre><code class="lang-bash">rm -rf <span class="hljs-variable">$HOME</span>/projects/temp &amp;&amp; mkdir <span class="hljs-variable">$HOME</span>/projects/temp
</code></pre>
<p>You need to save this script to <code>~/.scripts/loginscript.sh</code>. The script will delete the files and folders in the <code>temp</code> folder every time you run it. We will now need to create the config file for Launchd so the script runs every time we log in.</p>
<pre><code class="lang-plist"><span class="hljs-meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">plist</span> <span class="hljs-meta-keyword">PUBLIC</span> <span class="hljs-meta-string">"-//Apple//DTD PLIST 1.0//EN"</span> <span class="hljs-meta-string">"http://www.apple.com/DTDs/PropertyList-1.0.dtd"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">plist</span> <span class="hljs-attr">version</span>=<span class="hljs-string">"1.0"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">dict</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>Label<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>com.user.loginscript<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>ProgramArguments<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">array</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>/bin/zsh<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">string</span>&gt;</span>PATH-TO-HOME-DIRECTORY/.scripts/loginscript.sh<span class="hljs-tag">&lt;/<span class="hljs-name">string</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">array</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key</span>&gt;</span>RunAtLoad<span class="hljs-tag">&lt;/<span class="hljs-name">key</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">true</span>/&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dict</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">plist</span>&gt;</span>
</code></pre>
<p>Let's save this to <code>~/Library/LaunchAgents/com.user.loginscript.plist</code>. Before registering it to <code>launchd</code> we first need to get the user id as it needs by <code>launchd</code>:</p>
<pre><code class="lang-bash">id -u
</code></pre>
<p>Copy the output and run the command below to register the configuration to <code>launchd</code>:</p>
<pre><code class="lang-bash">sudo launchctl bootstrap gui/&lt;USER-ID&gt; ~/Library/LaunchAgents/com.user.loginscript.plist
</code></pre>
<h1 id="heading-references">References</h1>
<ul>
<li><a target="_blank" href="https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html">Apple Documentions for Launchd</a></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[How to automate pushing training docker image to ECR  for SageMaker]]></title><description><![CDATA[Introduction
There are many times that you would prefer a custom docker image over what Sagemaker provides for you. People are using some tricks to install custom packages of their needs inside those default containers, but that is not clean nor reli...]]></description><link>https://kiani.info/how-to-automate-pushing-training-docker-image-to-ecr-for-sagemaker</link><guid isPermaLink="true">https://kiani.info/how-to-automate-pushing-training-docker-image-to-ecr-for-sagemaker</guid><category><![CDATA[AWS]]></category><category><![CDATA[ECS]]></category><category><![CDATA[Docker]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Kiarash Kiani]]></dc:creator><pubDate>Wed, 09 Feb 2022 07:25:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/4CNNH2KEjhc/upload/v1644390911512/iP5MEMC29.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>There are many times that you would prefer a custom docker image over what Sagemaker provides for you. People are using some tricks to install custom packages of their needs inside those default containers, but that is not clean nor reliable to do. Also, building a Docker image and pushing it to Elastic Container Registry is another kind of a problem. It has lots of steps and lots of commands to memorize to push the image to ECR every time you wanted to change something, adding a feature, or fixing a bug.</p>
<p>Amazon Web Services provides lots of tools to create and build infrastructures that made your life easier. With a combination of a git server (like Github), CodePipeline and CodeBuild you can automate the process of building a custom docker image and push it to ECR, and now you can get all of your focus on designing the model 🍻!</p>
<p>In this post I’m going to use GitHub as my git repository server but feel free to replace it with Bitbucket or Amazon CodeCommit service. The structure of the integration of these three systems is pictured below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644390839044/HMURJUv6o.png" alt="FBA9BAE4-3BEE-4C9B-8171-3C24DB669502.png" /></p>
<h2 id="heading-elastic-container-registry">Elastic Container Registry</h2>
<p>ECR is a private docker repository where one can push and pull its images to it. The ECR also has a public repository where anyone can download images with no authentication required. The policy access is available under permissions of a private repository which enables us to grant specific AWS accounts access or define wildcard access (which means anyone can have access) for pulling images.</p>
<p>To create a new ECR repository:</p>
<ol>
<li>Go to the AWS console</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644390974901/8-lcvOycj.png" alt="Screen_Shot_2021-07-18_at_3.39.51_PM.png" /></p>
<ol>
<li>Click on <code>Create repository</code></li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644391073174/Ha7ppkB6T.png" alt="Screen_Shot_2021-07-18_at_3.41.49_PM.png" /></p>
<ol>
<li>Click on the view push commands to see commands to push to the repository of your interest.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644391104493/Q9pt2sEa2.png" alt="Screen_Shot_2021-07-18_at_3.43.00_PM.png" /></p>
<h2 id="heading-codepipeline">CodePipeline</h2>
<p>CodePipeLine act as an entry point of the structure. By pointing a new pipeline to a git repository and selecting a branch, the pipeline will subscribe to commits and code updates. Every time you push a new commit to the chosen branch of this repository, a web hook trigged the pipeline to pull changes and begin the process of building and pushing to ECR.</p>
<p>To begin with, you need to create a new pipeline from AWS console. Follow the config provided below and leave other configs with their default values.</p>
<h3 id="heading-step-1">Step 1</h3>
<ul>
<li><strong>PipeLine name:</strong> logical name referred to the name of our project.</li>
<li><strong>Service Role:</strong> choose New service role, so AWS create least privileged role for running pipeline.</li>
<li><strong>Role Name</strong>: the name of the role that AWS will create on behalf of you.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644391173029/RJSsg5a3E.jpeg" alt="F459649E-3C0E-4E4D-9315-17AD809CDED0.jpeg" /></p>
<h3 id="heading-step-2">Step 2</h3>
<ul>
<li><strong>Source provider:</strong> choose the git repository service to connect to from the provided list.</li>
<li><strong>Repository name:</strong> your source repository.</li>
<li><strong>Branch name:</strong> branch of the repository to pair with.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644391204510/6DZi2SUL8.jpeg" alt="C52D0569-ADA9-4892-B55B-B2134D5B77B9.jpeg" /></p>
<h2 id="heading-codebuild">CodeBuild</h2>
<h3 id="heading-step-3">Step 3</h3>
<p>In this step you can choose to connect to a created CodeBuild service or create a new one. In a CodeBuild service we define bash instructions to build and push our docker image from pulled source by CodePipeline service. The CodePipeline service is responsible for starting build process by calling the CodeBuild service. </p>
<p>Since we are creating new structure, we need to create new CodeBuild project for this repository by hitting the <em>Create project</em> button.</p>
<blockquote>
<p>🚨 Region must be the same for all three service we are using in this architecture.</p>
</blockquote>
<ul>
<li><strong>Project name:</strong> logical name referred to the name of our project.</li>
<li><strong>Operating system:</strong> select <em>Amazon Linux 2</em></li>
<li><strong>Runtime(s):</strong> select <strong>*</strong>standard*</li>
<li>Image version: select <em>Always use the latest for this runtime version</em></li>
<li><strong>Privileged:</strong> check the check box so we could be able to push to ECR.</li>
<li><strong>Service Role:</strong> choose New service role, so AWS create least privileged role for running pipeline.</li>
<li><strong>Role Name</strong>: the name of the role that AWS will create on behalf of you.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1644391311854/rC5vzF1lT.jpeg" alt="1DC27349-A2AF-4C6E-A184-9A18B60208FC.jpeg" /></p>
<p>The <code>buildspec.yml</code> file needs to be created, so CodeBuild can automatically build and push the image. To do this, simply replace the following commands in the scripts with the ones seen in ECR-view push commands:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-number">0.2</span>

<span class="hljs-attr">phases:</span>
  <span class="hljs-attr">pre_build:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Logging</span> <span class="hljs-string">in</span> <span class="hljs-string">to</span> <span class="hljs-string">Amazon</span> <span class="hljs-string">ECR</span> <span class="hljs-string">and</span> <span class="hljs-string">Docker</span> <span class="hljs-string">Hub</span>
  <span class="hljs-attr">build:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Build</span> <span class="hljs-string">started</span> <span class="hljs-string">on</span> <span class="hljs-string">`date`</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Builing</span> <span class="hljs-string">image</span> <span class="hljs-string">from</span> <span class="hljs-string">docker</span> <span class="hljs-string">file.</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">docker</span> <span class="hljs-string">build</span> <span class="hljs-string">-t</span> <span class="hljs-string">&lt;YOUR</span> <span class="hljs-string">REPOSITOR</span> <span class="hljs-string">YNAME&gt;:latest</span> <span class="hljs-string">-f</span> <span class="hljs-string">Dockerfile</span> <span class="hljs-string">.</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Pushing</span> <span class="hljs-string">image</span> <span class="hljs-string">to</span> <span class="hljs-string">Amazon</span> <span class="hljs-string">ECR.</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">aws</span> <span class="hljs-string">ecr</span> <span class="hljs-string">get-login-password</span> <span class="hljs-string">--region</span> <span class="hljs-string">eu-west-1</span> <span class="hljs-string">|</span> <span class="hljs-string">docker</span> <span class="hljs-string">login</span> <span class="hljs-string">--username</span> <span class="hljs-string">AWS</span> <span class="hljs-string">--password-stdin</span> <span class="hljs-string">&lt;AWS</span> <span class="hljs-string">ACCOUNT</span> <span class="hljs-string">ID&gt;.dkr.ecr.eu-west-1.amazonaws.com</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">docker</span> <span class="hljs-string">tag</span> <span class="hljs-string">&lt;YOUR</span> <span class="hljs-string">REPOSITOR</span> <span class="hljs-string">YNAME&gt;:latest</span> <span class="hljs-string">&lt;YOUR</span> <span class="hljs-string">ECR</span> <span class="hljs-string">REPOSITORY</span> <span class="hljs-string">PATH&gt;:latest</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">docker</span> <span class="hljs-string">push</span> <span class="hljs-string">&lt;YOUR</span> <span class="hljs-string">ECR</span> <span class="hljs-string">REPOSITORY</span> <span class="hljs-string">PATH&gt;:latest</span>
  <span class="hljs-attr">post_build:</span>
    <span class="hljs-attr">commands:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">echo</span> <span class="hljs-string">Build</span> <span class="hljs-string">completed</span> <span class="hljs-string">on</span> <span class="hljs-string">`date`</span>
</code></pre>
<h2 id="heading-summary">Summary</h2>
<p>In a nutshell, you will be able to autonomously generate machine learning model images with this architecture. This automation will bring you more stability of the codes and infrastructure because we eliminate the process of human interventions from pushing the image to the ECR. CodeBuild service builds and pushes the images from the ECR when we push a commit to the master branch. The trigger for this process occurs when commits are pushed to the master branch.</p>
]]></content:encoded></item></channel></rss>