How to learn machine learning from scratch?

machine learning suitable for

Machine learning is the buzz word these days and everybody wants to know something about it. In times to come machine learning will be a concept that everyone who needs to stay competitive will have to know about.

What is Machine Learning?

Machine Learning

Traditional programs take data as input and produces data as output.

However a machine learning algorithm takes data as input but produces a program as an output. This machine generated program can now take new data, process it and produce output data.

Machine learning algorithms automate the process of creating programs using historical data. In simple words, it gives computers the capability to extract knowledge from data and store it for future judgement.

Wikipedia defines machine learning as…

Machine learning is the subfield of computer science that, according to Arthur Samuel, gives “computers the ability to learn without being explicitly programmed”.

This brings us to our first introspective question…

Which kind of machine learner are you?

Depending on the role you have in your organization or the role you aspire for, you might fall in one of the three different category of machine learners.

1. Businesses user

High Level Understanding

Business user is involved in the day to day running of a business. They run the operations and are responsible for defining and executing the business processes of a company. In traditional companies, the executives, operations team and managers fall in this category.

For this kind of user a high level understanding (non technical) of what Machine Learning can do and what it can’t do is beneficial. They need just enough information to equip themselves to determine whether they will see return on investment on machine learning or not?

Here are some examples of successful use cases of machine learning:

  • Traditionally for customer support operation teams spend lot of dollars on costly human resources. Machine learning can automate menial operational tasks like customer support.
  • Machine learning can analyze tons of usage data (big data) and make remarkable suggestions on business tactics that can be applied to increase revenue.

2. Machine Learning Engineers & Data Scientists

Machine Learning Scientist

This learner is someone who will apply machine learning to real life problems. They are the ones who will be the consumers of all machine learning frameworks like Watson, Spark or Sci-kit learn.

They will have a flare for playing with data. They will love to gather data, clean it, augment it with missing information and then use it for machine learning. They are also called machine learning engineers or data scientists.

Industry is hungry for machine learning engineers and data scientists who can apply machine learning algorithms to help the business reduce costs or expand their revenue streams.

This group will not be responsible for creating algorithms. They will users of existing machine learning libraries to solve the problem at hand.

They will be required to understand the strengths and the weaknesses of different machine learning algorithms. They will have to know how a given algorithm behaves in different situations and what is the algorithm best suited for a given type of problem.

They will be responsible for using programming (mostly Python, R, Scala, Java, etc. ) for gathering data from across the organization and public sources, cleaning it and then massaging it to be fed into the machine learning algorithms.

These users will combine art and science to solve the given business problem. Lot of their time will go in trial and error approaches before they arrive at an optimum solution.

3. Theorists and Researchers

Machine Learning

If you belong to this group, then you are working on creating cutting edge machine learning libraries that will be used by the machine learning engineers and the data scientists.

You would be an aspiring student or actively studying computer science or mathematics.

An example would be the group of students from the university of Waikato, New Zealand. They have created an open source machine learning library called Weka.

The core team of IBM that developed Watson would also belong to this group.

Machine learning enthusiasts who contribute to open source projects like Spark and Sci-kit learn would also be a part of this group.

How to go about understanding Machine learning?

The method of learning would depend on which of the above type of learner you are?

If you are a business user who needs to get a high level of understanding then your best bet is doing online research. There are plenty of resources available on youtube to give you that preliminary understanding.

We at MCAL Global did a webinar that summarizes machine learning. We recommend you watch it. It is available at the following link:

Machine learning with Python webinar

Machine learning engineers and data scientists will also find the above webinar useful. It will give an overview on machine learning concepts and its applications. It will also give details on the line items that needs to be learnt to start this journey.

However if you are an aspiring machine learning engineer or data scientist you will need professional training.

Just reading free stuff online and watching free videos will not give you the kind of depth you are looking for. You will have to invest in some form of structured training with a mentor.

We at MCAL Global have an online instructor led training called “Machine learning using Python” which helps people hone in their machine learning and data science skills. It is a weekend course so working professionals could also enroll in it. For detailed information on this course email us at ml@mcal.in

If you plan to become a researcher who is looking to create machine learning algorithms, we recommend that you enroll in a college for a long term formal education in Computer Science or Mathematics.

If you are at a point where you are trying to make a decision whether to go down the path of machine learning or not then try to answer two questions:

  1. Does machine learning interest me?
  2. What are the future prospects in the field of machine learning?

You yourself are the best judge on your interest in this field. You will have to evaluate it yourself. No one can make that decision for you.

But if you have doubts on the future prospects of machine learning, then look at the following info graphic:

Few Facts

What should I do next?

Use the table below to identify which kind of learner you are, then you can review the learning approach recommended for you and relevant resources suggested for you.

What Kind Of Learner am I ?

What should be my approach? Suggested Resources
Business Learner-

For Business Users,executive & managers

Online Research & Videos
Machine Learning Practitioner – For IT Engineers and Data Scientists Online Self placed courses or instructor led courses
Theorists & researchers – Machine learning providers Enroll in a masters or postgraduate program in a university

Do I need to learn programming?

If you are a business learner then you don’t need to learn any programming. In fact you don’t even have to know how various algorithms work.

All you need to know is what is machine learning at a high level. What are its strong points and where it doesn’t work. Armed with this information you will be able to take strategic decisions.

Any problem that can be solved by machine learning should have the following characteristics:

  • There should be a pattern in the data. Without this basic hypothesis machine learning doesn’t work. Machine learning doesn’t work on random data. So it’s crucial that the data collected for solving the problem has some patterns hidden in it. There is an underlying correlation that exists.
  • The pattern or correlation should not be known. There should be a general sense of pattern but the exact pattern should be unknown. Because if the pattern is known then what’s the point of machine learning?
  • There should be lots of relevant data. Machine learning algorithms are data hungry and work well when lots of relevant data exists for the algorithms to analyze and detect the patterns. As human beings learn from experience, the machines learn from data. The more data you have the more experienced your machine learning model will be.

A good overview of these concepts are in our machine learning webinar:

Machine learning with Python webinar

Which programming language should I learn?

If you want to become a data scientist or a machine learning engineer then you will have to pick a programming language.

Without the knowledge of programming you will not be able to use machine learning or create new algorithms. At some point in time you will have to delve into the programming side of the world.

For machine learning you don’t need to understand the heavy duty GUI intensive programming, web based or socket programming. All you need to know is how to read, write and manipulate data. How to write mathematical logic behind the algorithms.

In a nutshell your use of programming will be targeted towards machine learning. Now here is where the biggest question arises…

Which programming language should I learn?

Linux

There are a plethora of programming languages Java, C, C++, .Net, Scala, Ruby, Python, R etc. It gets confusing quite fast when it comes to making a decision which one to use?

If you do a little bit of online research, it will become clear to you that Python is emerging as one of the leader in the machine learning space. It is followed by R.

You will notice that analysts who want to apply machine learning to solve real world problems are jumping on the Python train.

The ones who want limited programming would and stick to academics are going for R.

A quick google search on the top programming languages will convince you that Python is in the top 5 list.

R or Python?

To help you make a decision we made a side by side comparison of the strengths and the weaknesses of Python and R.

Python Programming Language

Strengths Weakness
  • Open Source
  • Platform Independent
  • Amazing data manipulation capabilities
  • Object Oriented Programming
  • Top 3 programming Languages of the word
  • Default programming language of Data Scientists
  • Amazing out of box scientific libraries
  • Huge community & fan following
  • Last but not the least -the easiest language to learn & use
  • To learn one must put more effort than R
  • I am thinking very hard but can’t think of any other weakness

 

R Programming Language

Strengths Weakness
  • Open Source
  • Loved by statisticians
  • Lot of out of box capabilities & algorithms
  • Huge community & fan following
  • Pays emphasis on model interpretability rather than predictive Analytics.
  • Much easier than python.
 

  • Not as flexible in data manipulation or data munging.
  • Do not have the full flexibility of a programming language.

We suggest, take a deep breadth and analyze your needs before picking your programming language. If you are confused and not able to make a decision then just go for Python :). It is a good choice.

What topics should I cover in Programming?

In general learning a programming language is an ongoing process and involves a lifetime of learning.

Luckily for machine learning and data science we don’t need to learn it all. We recommend focusing on the following topics on any programming language that you decide to pick.

  • Programming Basics – Keywords, Statements, Operators, Data Types
  • Flow Control – If else, For loop, While loop
  • Functions – User defined functions, Arguments, Return value
  • File Handling – List files, Read files, Write to files
  • Miscellaneous items – Exception Handling, Logging
  • Amazing Libraries (applicable to Python) – Matplotlib, NumPy, Pandas and Scikit-learn

Enough about Programming, what topics should I cover in Machine Learning?

Machine learning at its core is made up of Data Structures, Algorithms, Statistics, Linear Algebra, Probability theory and Calculus.

If you are beginning your career as a data scientist you will need to learn about following topics:

  • Statistics – Mean, Median, Mode, Standard Deviation, Normal Distribution, Z-Score, histograms
  • Probability – Basics, Bayesian Probability
  • Calculus – Gradient Descent, Root mean square, Distance
  • Machine Learning – Step Zero – Supervised Learning, Unsupervised Learning, Regression, Classification, Clustering
  • Machine Learning – Step One – Linear Regression, Polynomial Regression, Regularization, Ridge Regression, LASSO Regression, Logistic Regression
  • Machine Learning – Step two – Decision Trees, KNN algorithm, K Means Clustering, Principal Component Analysis, Linear Discriminant Analysis, Quadratic Discriminant Analysis
  • Machine Learning – Final Step – Deep Learning and other advanced algorithms

We would love to hear from you about the article or any other question you may have. Please drop us an email at ml@mcal.in to get in touch with us with your question and feedback. Feel free to reach out to us with your training or consulting needs as well. We would love to hear from you.

admin

Click Here to Leave a Comment Below

Leave a Comment:

MCAL