Clustering Problem: Solving employee relocation problem using Foursquare API

Samuel Theophilus
Analytics Vidhya
Published in
5 min readJul 5, 2021

--

Source: Unsplash

A Company’s best resource is the people who bring creativity, productivity, and ultimately profitability to a company- Its Employees. For an organization to grow, it needs to develop its employees, have a great product, an amazing marketing team, and an expansion plan. One challenge faced during expansion is the demand for skilled staff who are capable of nurturing new expansion sites (branches). An organization can choose to:

  1. Hire new and train new Staff and deploy them to new locations; or
  2. Relocate existing experienced Staff.

The second option is definitely a cheaper and more effective option for the company because working with current capable employees who are already familiar with the company structure and operations minimizes expenses in terms of cost & overhead time required for new staff to adapt and handle the company’s operations. However, there is a need for the Organization to adopt strategies that ensure relocated staff is happy in the new locations. This ensures that Staff movement is a less daunting task for the company and also eliminates a high-stress situation for the employee.

Problem Definition

For this experiment, we would be helping a Company “The WXYZ Company” to solve its employee relocation challenge:

The WXYZ Company has been operating in Brooklyn, New York, USA for the past 5 years. This year, the board made the decision to open an office in Coventry, England, and would like to select some of its existing employees to fill some managerial roles at its new branch.

This Data Science project aims to compare the neighborhoods in Brooklyn, New York (Current company location) with the neighborhoods in Coventry (New Branch) and create Clusters of similar neighborhoods.

This project will help the company to:

  1. Identify Employees who would have a smoother transition to the new branch (by identifying if their current residential address matches a cluster in the new location).
  2. Identify Locations to consider as recommendations for employees who agree to relocate.

Data Sources

The Brooklyn Neighborhood Data was retrieved from a JSON File accessible here. This dataset contains a lot of information on New York; however, this project only extracted the following:

  • Borough
  • Neighborhood
  • Latitude
  • Longitude

The Coventry Neighborhood Data was scrapped from the Wikipedia page which contains a list of Postal Codes and respective Neighborhood information. This data did not have the latitude and longitude information, so the geocoder python library was used to extract the coordinates:

  • Postal Code
  • Borough
  • Neighborhood
  • Latitude
  • Longitude

The dataset presented above is not enough to help us generate good clusters. It is logical to decide that the places (Gym, Restaurant, etc.) people visit usually correlates to the type of environments they find to be comfortable.

This is why the decision was made to use popular venues (places data in Foursquare API) in this project. This information was extracted using the Foursquare API to neighborhoods in Brooklyn, New York, and Coventry. I explored the most common venue categories in each neighborhood. After data cleaning and transformation, I was able to generate the data structure as shown below.

What is Foursquare API?

The Foursquare Places API provides location-based experiences with diverse information about venues, users, photos, and check-ins. The API supports real-time access to places, Snap-to-Place that assigns users to specific locations, and Geo-tag. For more information, visit this link.

Exploratory Data Analysis

Before we train the model, let us try to visualize the distribution of data. The graphs below visualize the top venues in both Brooklyn and Coventry.

From this horizontal bar chart, it is clear that Indian Restaurants and Liquor Stores are the most common venues in Coventry, which could imply that Indians are probably quite comfortable in Coventry and might not have an issue relocating to Coventry from Brooklyn.

This graph in Brooklyn shows that Pizza Places and Chinese Restaurants are the most common venues among neighborhoods in Brooklyn. This could mean that relocating the Chinese or Italian employee might not be the best idea.

Clustering & Result Evaluation

Why Clustering?

Clustering is a Machine Learning technique that involves the grouping of data points. Data points that are in the same group should have similar properties, while data points in different groups should have highly dissimilar properties.

As discussed in the introduction, the purpose of this project was to find neighborhoods in Coventry that have similarities with neighborhoods in Brooklyn so that employees will have a smooth transition when they relocate to Coventry at the Company’s new branch office and clustering (k-means in this case) is able to handle problems with this structure.

Visualizing Results

Let's look at the query results of 2 Brooklyn neighborhoods and 2 Coventry neighborhoods clustered using K-means:

By merely comparing the values in the columns you will be able to understand why these neighborhoods fall in the same clusters.

Now let’s look at the Clusters as visualized on a Geographical map to have a better idea.

Clusters on Brooklyn, New York Map

Clusters on Coventry, England Map

The WXYZ company is can look into the recommendations as modeled in this and decide the neighborhood they prefer. After researching similar neighborhoods, it is now possible to draft a list of recommendations from this cluster model.

The color codes in the two graphs show that Cluster 9 -Green (Followed closely by Cluster 12-Orange) is a very common neighborhood type in Brooklyn and appears a little on the Coventry Map. This indicates that employees relocating to Coventry will have the smoothest transition on these clusters.

Conclusion

In this study, I extracted Brooklyn and Coventry Neighborhood data along with their most popular places using the Foursquare API. I transformed the data and clustered the data to find similar neighborhoods between Brooklyn and Coventry. All this was done to help the Company to identify residential neighborhoods for its staff that will have the smoothest transitional effect. International relocation is an attractive facet of a career at any company and is good for the company as a whole and this project helps the company to ease the relocation process.

For more details about this project:

GITHUB LINK TO NOTEBOOK

--

--

Samuel Theophilus
Analytics Vidhya

Machine Learning Engineer || Technical Writer || Data Engineer • Passionate about Computer Vision, NLP & Business Intelligence.