This summer I was an intern at Numerify
! Numerify is a tech startup providing a system of Intelligence for IT.
It was an incredible experience, and I learned so much in such a short period.
Numerify’s clients include companies such as Netflix, Coach, and McDonald’s. They use Business Intelligence and analytics tools to help their customers business manage their IT Data.
Over the course of the summer, I picked up a whole laundry list of acronyms used in the Business Intelligence (or should I say the BI) world:
RDMS – Relational Database Management System
ITSM – IT Service Management
ITPA – IT Product Analytics
SDA – Software Development Analytics
ETL – Extract Transform Load
APM – Appliance Performance Monitoring
SLA – Service Level Agreement
CSAT – Customer Satisfaction
GUID – Global Unique ID
POM – Project Object Model
REST – Representational State Transfer
CRUD – Create, read, update and delete
ER – Entity Relationship
OLTP – Online Transaction Processing
CGI – Common Gateway Interface
RDD – Resilient Distributed Datasets
HDFS – Hadoop Distributed File System
MPP – Massively Parallel Processing
RAID – Redundant Array of Independent Disks
SME – Subject Matter Expert
ROC – Receiver Operating Characteristic
Of course, learning acronyms isn’t the only thing I did 🙂
My main project was building a Machine Learning model to predict when a customer’s incident escalated. Basically, an “incident” is when something goes wrong and it gets assigned to a certain level or “tier” to be solved. If the tier it was assigned to was not enough, it gets “escalated,”i.e. pushed up to an even higher tier. My goal was to build a model that could predict when an incident would be escalated before it happened.
I am still super excited about this because it is actually something that real customers will get to use in the future. Though I code almost all the time, most of it is either for personal stuff just for fun, or it is for school assignments. This internship experience gave me the opportunity to create something that will be used in the industry by consumers. It is an entirely new perspective. In the past, I only had to be concerned about the efficiency and correctness of my own code and that was it. Here, I had to make sure my code integrated with pre-existing workflows and tools. There were many moving parts to creating the project, and I had to contact different teams to gather needed information and data for certain steps. It’s amazing to know that my work will be used in an actual industry setting.
I started off with raw customer data and then created a Decision Tree and Random Forest Model to predict escalations about future data. I’ve elaborated on the process of creating a model more in this post along with a tutorial!
Along the way, I got to learn about a lot of different Machine Learning topics. Whenever I’m learning something new, I’m always on the hunt for the best visuals possible (I’m a very visual learner and truly believe a picture can benefit the learning process so much)
Here are some of the best visual explanations of certain topics I found:
Gini Impurity: https://www.quora.com/What-is-the-interpretation-and-intuitive-explanation-of-Gini-impurity-in-decision-trees
Bias vs. Variance: https://elitedatascience.com/bias-variance-tradeoff
Bagging and Boosting: https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
One topic I couldn’t find good visuals for was about using the Spark Machine Learning Pipeline (perhaps because it is such a niche topic).
So I made my own!
I have elaborated a lot more on how to use these in my tutorial here: https://github.com/parmita52/machine-learning-spark-ex
Overall, I am incredibly proud of my progress and accomplishments at Numerify, and I know it was a summer well spent!