Am I too old to learn to code?

 

IMG_5279

“Every great developer you know got there by solving problems they were unqualified to solve until they actually did it.” – Patrick McKenzie

As a 31 year old mother of two, I understand the fear of coming into the technology field later in life. The other students that I worked with in my software engineering internship were 21 and 22. I did not take coding classes in highschool, or before I was almost 30 for that matter. The language and structure was completely foreign to me. My first coding class (done in Python) was a struggle. I got a C in the class and left feeling like I had literally learned a foreign language, and like I was still years behind.

I was also in love. I have to admit that there have been very few instances in my learning career that have felt so intellectually gratifying.

The sense of accomplishment that can come from this type of learning is really unexplainable. It’s nothing like successfully explaining the Central Dogma to a group of fine arts majors or even being able to talk about the history and philosophy of science with a PhD student. Being able to understand and explain what I learn from books about molecular biology and genetics (which I truly love) has not compared to the satisfaction of feeling like I will be the one to interpret the vast amounts of genomic data being collect as you read this.

My first “bioinformatics program” was one that simply counted the “A” , “C”, “G” and “T” bases in a text file, and spit out an occurrence percentage of each. It was a painfully simple program, but there is no denying the excitement of where it leads and what else this field offers.

Until you have spent hours on an algorithm or project hitting error after error, draining a coffee pot, sleeping with your computer, and coming out victorious, you will not understand what this world has to offer. I will reiterate that the sense of intellectual accomplishment and potential for real world data interpretation that is born from these experiences is very, very real. It is also addicting, if you can push through the initial failures.

Henry Ford designed the model T at 45, Momofuku Ando invented instant ramen at 48, Julia Child wrote her first cookbook at age 50, Ray Kroc bought McDonalds at 52, of course there are many more stories of people that didn’t make it until they did…. the key to learning something new is to follow through.
Now that you are convinced ( I hope ) that the reward is worth the work, I want to assure you that you are not too old to learn to code.

I hope to start introduction to python tutorials on this blog and on my YouTube channel @biocodebox, which will be very beginner friendly. Until then, a book I would recommend is Python Programming for the Absolute Beginner.  It is very good and starts from the beginning with simple programs and easy to understand terms included. I always feel it’s nice to have a hard reference like a book to make notes in.

Accepting the challenge to learn to code will be extremely rewarding and I genuinely hope that you decide to take it on!  If I can do it with two young children while helping run a non-profit 4 days/week, going to school full time, working part time and pursuing a research project…. anyone can ;).

How to import a Blosum Matrix in Python

biopython

There is rarely only one way to do something in this field. Importing a scoring matrix is one of them. There are also many different types of scoring matrices each better suited for different applications. The two Blosum (BLOcks SUbstitution Matrix) scoring matrices I use for protein sequence alignment are Blosum 62 and Blosum 50. Again, there are many ways to import a Blosum Matrix, but I will give you two:

  1. You can use BioPython:
    from Bio.SubsMat import MatrixInfo
    MatrixInfo.blosum50

    Or to assign your matrix to a new variable:
    from Bio.SubMat import MatrixInfo as matrixFile
    matrix = matrixFile.blosum62

  2. You can use “brute force” and parse a Blosum text file. After reading in the text file lines I put mine into a dictionary by { (protein 1, protein 2) : score } for further use. The text file will look something like this (This is a Blosum50 Matrix):


A R N D C Q E G H I L K M F P S T W Y V B J Z X *
A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0 -2 -2 -1 -1 -5
R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 -3 0 -1 -5
N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 5 -4 0 -1 -5
D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 6 -4 1 -1 -5
C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -3 -2 -3 -1 -5
Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 0 -3 4 -1 -5
E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 1 -3 5 -1 -5
G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -1 -4 -2 -1 -5
H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 0 -3 0 -1 -5
I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -4 4 -3 -1 -5
L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1 -4 4 -3 -1 -5
K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 0 -3 1 -1 -5
M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 2 -1 -1 -5
F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -4 1 -4 -1 -5
P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 -2 -3 -1 -1 -5
S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 -3 0 -1 -5
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0 0 -1 -1 -1 -5
W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -5 -2 -2 -1 -5
Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 -3 -1 -2 -1 -5
V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5 -3 2 -3 -1 -5
B -2 -1 5 6 -3 0 1 -1 0 -4 -4 0 -3 -4 -2 0 0 -5 -3 -3 6 -4 1 -1 -5
J -2 -3 -4 -4 -2 -3 -3 -4 -3 4 4 -3 2 1 -3 -3 -1 -2 -1 2 -4 4 -3 -1 -5
Z -1 0 0 1 -3 4 5 -2 0 -3 -3 1 -1 -4 -1 0 -1 -2 -2 -3 1 -3 5 -1 -5
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -5
* -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 1

Using argparse, here is the python code for importing the above Blosum50 text file into a referenceable dictionary.

if __name__ == ‘__main__’:
    parser.add_argument(‘-m’, dest = ‘matrixFile’)
    args = parser.parse_args()
    matrixFile = args.matrixFile

with open(args.matrixFile, ‘r’) as matrixFile:
    blosum_dict = populate_matrix(matrixFile)

def populate_matrix(matrixFile):
”’
reads in blosum matrix txt file and sets key: value pair as tuple(i, j) = score
”’
    plines = matrixFile.readlines()
    matrixFile.close()
    dictaa = {}
    aminoacidstring = lines[0]
    aminoacidstring = aminoacidstring.split()

    i = 1
    while i <= (len(lines)-1):
        row = lines[i]
        row = row.split()

        j = 1
        for character in row[1:25]:
            dictaa[aminoacidstring[i-1],aminoacidstring[j-1]] = character
            j+=1
        +=1

    return(dictaa)

From here, you can read in protein sequences and get the scores for each amino acid match using the dictionary you created, and likely do some more calculating, but this should get you started!

Full Smith Waterman alignment project available on GitHub:
https://github.com/cjthomasson/SmithWaterman, this project finds the alignment and max score for two protein sequences.

What’s the point, Grandma? (API’s and endpoints)

farm.jpg

The best way to KNOW that you know something is to be able to explain it to your grandma.  Today, I will teach you about endpoints, Grandma!

Firstly, let’s talk about what an API is.   API stands for Application Programming Interface, it allows one piece of software to talk to another piece of software.  Let’s start with the farm analogy, since you grew up on a farm Grandma

Let’s say that your farm’s name is Back End Farms (this represents the back-end/server side).  At Back End Farms you sell eggs…  In our analogy, the people that enjoy your eggs represent the users or client.  You sell eggs to local restaurants because someone let it slip that you make the best omelets in the state and once the word got out, your requests for eggs sky-rocketed.   In order to fulfill the egg supply requirements, you had to buy more hens, keep track of orders, balance your check book ( <– Grandma knows what this means 😉 ) etc.   All of these processes are actually handled at BACK END Farms.   In order for your eggs (data) to get to the customers (users), you had to create a path that points customers down the right road to get to the eggs.  A.k.a. the end point.

To add one more piece to the analogy, picture distributers putting in orders for eggs and delivering them to the restaurants or markets.  Distributers that follow this path back and forth between the restaurants and the farm delivering eggs, are like other developers using your API.   An API is a way for customers of any kind to use what you have at BACK-END Farms.  You can have multiple API’s for accessing different things on the back end, just like if Grandma decides to sell corn as well, she will have an API that gives access to a corn endpoint.  Additionally, multiple users can access the same API just like multiple delivery trucks can come to the farm to get eggs or corn.

To recap, backend = what happens on the server side (behind the scene at the farm), endpoint = where data (eggs) is obtained for users (customers/client) , API = what connects the user to the endpoint.

Right now, I am using an API call to connect to a database so that I can get data to use in a web app.  API’s and endpoints are used to access many different “backend” things.  They are extremely useful and I hope this has helped you understand more about how they work!

BioCodeBox Begins

the_neural_network_by_rajasegar-d2xx3w9

Thanks for joining me!

“The capacity to learn is a gift; The ability to learn is a skill; The willingness to learn is a choice.”

 — Brian Herbert

I think the first blog post is the hardest, so I’ll start with the basic goals for BioCodeBox.  To keep it simple, this is intended to be a box of tutorials for learning to code. Conceptually, the lessons will be applicable to many different kinds of data, however processing biological data puts a fun flavor on things.  You will learn that coding concepts are often transferable between languages, fields, and sub-fields once you understand them.  Realizing and developing the ability to apply concepts over syntax will make you a more dynamic and marketable programmer. You can analyze traffic statistics with some of the same tools you would use to analyze disease statistics.  I will also include some genetics, molecular and cellular biology tutorials as needed, or just when I get tired of boring data!

A little about me– Bioinformatics is what gets me excited about coding and manipulating data; I have a never ending passion for learning, researching, analyzing and planning, which I have finally decided to put to good use!

So becomes BioCodeBox.

Thanks for reading!