Updated: Nov 14, 2020
Hassan Kane, Data Scientist at Emtropy Labs
Hassan kicks off our Featured Startuper series, where we are excited to introduce you to some of our speakers and the work they’re doing in the startup space. Read on for a glimpse into the conversation at our Startup Boston Week 2020 event: A Bang for your Buck: A Cost-Effective Data Strategy for Startups.
Hassan is the Lead Data Scientist at Emtropy Labs, where he directs machine learning efforts to analyze customer service interactions and build high-performance customer support teams.
Prior to this role, Hassan was the CTO of Sela Technologies, where he built an enterprise platform for transparency and accountability in sustainable development projects. Before diving into the startup world, he studied Computer Science at MIT and dedicated his time to volunteering on the MIT campus with programs such as the F/ASIP Mentor and USAGE Committee.
When I met with Hassan, his clear passion and mastery of his craft were immediately apparent. With the experience he gained working in his field, he has some great insights into the world of data science. While speaking with him, I was curious to find out more about how his passion for data science was sparked and what in his field has been inspiring him amid a really weird world.
Hassan Kane: Coming into college, I was very interested in math/physics. I decided to major in computer science as it built on the abstract thinking skills honed in math and physics while being highly applicable.
For the mathematically inclined within computer science, there are two strong potential directions. First, cryptography, which is used to design secure communication schemes/encryption protocols and data science. Second, data science is about augmenting products/processes by deriving insights from existing data or creating new products that classic software cannot enable.
Examples of new applications powered by data science that I got to work on include remote sensing, which is about creating tools to analyze satellite images using computer visioning techniques and autonomous vehicles.
What attracted me to data science is its reliance on mathematical methods while also being inherently interdisciplinary. This skillset feels like a passport that allows me to travel across many fields, while still being able to do math. It doesn’t get any sweeter than this! Alicia: What are some of your hobbies outside of work? Hassan: One of them is community building. I strongly believe in building communities of people interested in various things. I am a part of virtual book clubs and also enjoy riding bicycles. Alicia: Well, building communities is exactly what Startup Boston is all about so it’s definitely great to be associated with people such as yourself who have similar values. To dive a little into Startup Boston Week 2020, what will be some key topics you'll be discussing during your event? Hassan: The first one would be discussing the data life cycle and getting value out of data. Learning how to gather data based on behavior/instance and then labeling the data correctly—really utilizing your startups data to be able to train models in order to improve models and keep track of the performance of the models over time. Think about all customer data at your disposal. It might be overwhelming at times, but to improve models it’s important to keep track of how they are doing over time. The second key topic will be Keeping track of assumptions over time which power your data science models—refocus on a post-COVID world.
COVID-19 has made some models a little conflicting because the assumptions under which the data was gathered has changed during this pandemic. We are moving into a new normal, which means that the behavior of customers or businesses will be different, right. That's why it’s important to have somewhere to have pre-post-COVID data to keep track of the model and the shifts in behaviors. The third key topic is tied in to the first two. We’ll call this one optimizing labeling efforts. Data labeling, in the context of machine learning, is the process of detecting and tagging data samples. The process can be manual but is usually performed or assisted by software. So forming a data strategy is not just about gathering data but, it’s about making sure that your data science models include the different behaviors you want the models to pick up. I will be discussing how to focus on deploying labeling efforts to capture outcomes of different behaviors. A fourth topic could be in relation to data strategy testing. In data science, people usually tend to think of testing the data in one dimension of: “How well are the models predicting outcome?” However, due to biases that are encoded in the history of training data models, a more prominent question now becomes: “Is this the right thing we want to learn from, we want to change the past?" It is extremely important to be able to answer this question of: "Are we learning and developing?"
It's all about testing across multiple dimensions, verifying that we have set goals and objectives, but looking at whether there are other ways in which biases can creep into our data science efforts that could have harmful effects. The realization is that data scientists cannot take the backseat and let the models learn from the data but, need to understand the assumptions that power the generative process of that data and make sure that it picks up the right trends or doesn’t reinforce trends which we don’t want. Alicia: So thinking about how you’ve built up data strategies, what are some challenges you've seen startups face when developing their own data strategy? Hassan: A big challenge I would say is startups not knowing where to start. There is a distinction between a startup that has a core product that works well and might be starting to gather data from customer behaviors or different business outcomes, and a startup that has a product which is powered by successful data models. These two types of startups come with a set of challenges they may face. The first type of startup would generally want to know how to start harvesting the data that is gathered by their current systems. Data scientists want to be doing meaningful work. Therefore, data efforts and hiring data scientists must be aligned with making informed business decisions. Know what you want to do and make sure that the objectives are clear for the team.
The other type of startups are those powered by models (meaning that data science is a core part of your product or service offering). One of the challenges these startups face could be based on technical risks. No machine learning model is perfect, so the challenge now becomes how to deal with that uncertainty.
Another challenge I've seen startups face is in their data strategy skills. The skills required to develop the models may not be the same skill set required to deploy the models. Deploying them means you need a good engineering team in place to support the model development effort. Alicia: So my next question is what should startups look at to know whether their efforts to be data-centered have been effective? What are some vital things to have in your data strategy? Hassan: I would say whether or not the startup has been able to harness existing tools to become better and by better—I mean being able to identify for example their ideal type of target market, business profile and features. Another way to analyze success would be the ability to make less decisions based on gut instincts and making more informed decisions based on data gathered. Human intuition is definitely important and will always be, it’s just making sure that you are better informed through data when making decisions.
Want to hear more from Hassan? Check the full event, A Bang for your Buck: A Cost-Effective Data Strategy for Startup at Startup Boston Week 2020, right here.