Project Title

Project 1

Research Area: Data Integration and Preparation for Analytics

Business Challenge:

Find factors affecting the increased attrition through the organizations and providing a business solution to reduce or a backup plan.

Target Data Model:

Employee Attrition Analysis: Relationship Model, Network Model & Logical Regression Model We would be performing relationships between the datasets to form correlations and analyze.

Dataset Link:


There are 6 datasets to be utilized for the analysis of the attrition of the organizations which includes employee survey data, general data, in time data, manager survey data, out time data and data dictionary. These datasets describe all details about 4410 employees of a randomly chosen companies holding all their data, characteristics, life, thoughts and survey results which would be easy to determine the attrition tendencies of the employees and do analysis for the same.

Dataset Link:

Tableau Public Story Link:

Tools and Platforms used/to be used:

-    Python – Data Quality Check, Manipulation, Cleaning, Joining and Analysis
-     MS Excel – Data Type updates and Data Curation
-     Tableau – Analysis, Visualization and Correlation Assertions
-     Power BI – Enhanced Analytics, Visualizations and Presentations

Research Questions:

Progressing with the study research and with dataset obtained, below are the research questions formed and analyzed upon:

1.    How is Environment Satisfaction, Job Satisfaction and Work Life Balance directly related to Attrition?
2.    Effects of Total Working Years and Job Satisfaction on Attrition based on Department and Job Roles?
3.    Attrition factors based on Performance Rating and Age?
4.    Marital Status and Educational Qualification with Environment Satisfaction contributes to Attrition?
5.    Travel Frequency with Job Involvement factors influence Attrition?
6.    How does Attrition affect from Job Level, Work Life Balance and Years since Last Promotion?
7.    Yearly average in-time and out-time of employee might contribute to Attrition factors?

Project 2

Research Area: Visual Analytics

Superhero Abilities Comparison from DC and Marvel


1.    heroes information.csv - Lists the characteristics of every superhero found on my source
2.    super_hero_powers.csv - Contains information about every superpower and lists if they are present in any given hero.


About the data

This dataset consists of an extensive analysis of the characteristics of superheroes from Marvel and DC. In addition to the listing down of the names and characters of each superheroes which exist in the world of Marvel and DC, the analysis holds another dataset which has extensive information about their powers, agility, durability, strength and various other highly important factors associated. Through the analysis from these two datasets combined, we can extract great insights behind the creation of these characters and defining their abilities. The dataset was collected in June 2017.


Nurturing our project with initial reading through metadata helped us have a deeper insight and actual meaning of the attributes of the dataset. Initial processing and cleaning as well as sorting of data using excel functionalities was the start point. Further analysis using Python programming to take up the data into dataframe and bringing back required features and columns to excel for further processing as well as to combine the two different datasets along required attributes and running analysis through them following with Tableau visualizations helped us deeply analyze the data with variety of ways to get an accurate insight about the data and its inferences. In addition, we also took some aid by Power Tallyv8 for confirmation on our analysis and additional graphs and visualizations.
We took up this approach since the dataset was extensive and split into two different levels and our approach aligned with our thoughts giving helpful results.

Main Outcomes and Evidences

Analysis through the dataset brought helpful insider insights about the design and creation of these powerful Superheroes. Based on our analysis, there could be a lot of inferences which help determine the thoughts behind creation of each of the Super Hero and as also how different characteristics of a character determines the features and power which are to be embedded into these characters. Helpful information has been analyzed and we intend to elicit through our project report.

Project 3

Research Area: Social Computing

RESEARCH TOPIC : Impact of Social Media aid on Workplace Learning


Does social media enhance workplace learning for working professionals?


•    Data Gathering.
•    Review of about 40 to 50 research papers.
•    Comparative analysis to connect various findings which can conclusively give us an objective answer to our research question.

Strategy and Implementation

Our basic strategy included to start working with the listing of all research papers and then start with their read, study and further research. However, midway, we realize that we should be reaching to 75% of our target, find things that we have got and missing out ones towards our final target and reach out for more research papers based on the needs. This part of the task is remaining for completion in addition to the draft of our final research paper which we intend to begin upon completion of our final set of research papers and tag mapping.

We plan to classify our research areas based on the tags and segregate our final paper based on the tags as topics and perform detailed research analysis and reporting over the subsections. We definitely intend to produce the final research paper before the deadline of our project and we are matching up our progress with the timeline as of now.


There are a list of challenges and roadblocks we have come across, tackled in the best possible way we could and there are still some milestones to be covered and we can have some more barriers coming upfront. Few of those have come and might come are as listed below:

•    Finding the right and helpful research papers.
•    Mapping the study research done through our interest and it’s helpful contribution.
•    Decisions on which sub-research topics need to be incorporated and determine tags.
•    Distribute count of research papers towards each tags and then make sure their specific research study is taken into account for those associated tags.
•    Lot of social media sources and platforms have not been extensively studied for corporate environments in the research community eg. Linked in, Indeed etc.
•    Attrition mapping with social media involvement is very difficult to understand and ascertain, however, this could be a key research factor but we have not found any strong research study about the same yet.

Project 4

Research Area: User Interface & User Experience

Topic: UI/UX Analysis of Webstore

Executive Summary

Terrapin Tech works exclusively with University of Maryland (UMD) students, faculty, and staff. Many people in UMD’s community are passionate about technology, and our on campus location means they can share their passion with them as they try out the newest gear they have to offer.

So, whether it’s helping a student select a computer to pursue their passion in video production or helping a faculty member pick a notebook that allows them to do statistical analysis on large data sets, it’s fun helping them decide. We are presenting a complete exhaustive User Interface and User Experience Analysis over the UMD Terrapin Tech Store for enhancement of the webstore, business improvement and upgraded experience.


The team ran Four out of Seven Exhaustive UI/UX tests which can be performed to evaluate a webstore or website. These 7 tests are mentioned below:

1.    Heuristic Evaluation
2.    Comparative Evaluation
3.    Surveys
4.    User Interviews
5.    Usability Test
6.    Flow Testing
7.    Split Test

Project 5

Research Area: Data Science

Topic: An Analysis of IMDB Movie Reviews

Research Question:

Can Online Reviews and Ratings Predict Consumer Sentiment behind a Movie?


In our study, we primarily wish to understand the role of online movie reviews and ratings in predicting consumer sentiment on IMDb platform. In this milestone, we run various linear and classification models to analyze our research questions, which are outlined as below:

RQ1a: What features are predictive of movie review sentiment?
RQ1b: Is movie review sentiment predictive of the numeric rating left by the user?
RQ1c: Does the analysis of movie review sentiment suggest a better scale on which to rate movies or other products?
RQ2: Can we predict how helpful a review would be? What is the most important factor that makes a review a helpful review?
RQ3: What is the impact of different factors like movie genre, star cast, movie budget etc. on the overall movie rating?

Data Source Link:

Data URL C.

Dataset Collection: describe the datasets you have used, how did you download them? what were the sources? did you use any API to collect it?

About Data Source:

IMDB is an online platform which provides ratings and reviews about all the movies which are released. The platform uses extensive algorithms and methodologies to collect user entered ratings and reviews and determine the overall features and ratings about the movie. The platform is considered to be legit and people make sure to visit the website before they go ahead to watch a movie.

Preliminary Text Analysis

To answer our first research question, we took the entire text corpus of all the text reviews and analyzed the reviews for consumer sentiments using R Package “syuzhet”.

Total number of movies: 963

Total number of reviews: 390,284

We created the above WordCloud for our text review corpus. The WordCloud essentially highlights the most frequent words (the larger the font size of a word, the more frequent it appears) used by IMDB users when posting a review. We observe words like: Amazing, Like, Great, Film, Movies etc. being the most frequently used ones in review comments. We also observe words like: Dark, Horror, Interesting etc. which seems to indicate users talk about the genre and overall feeling about the movie.

Regression Modeling (Linear, Multivariate and Logistic)

To answer our research question:
RQ1a: What features are predictive of movie review sentiment?,
we consider both linear and multivariate regression models with the eight emotions as our features/independent variables and use them to predict the overall review sentiment.

Dependent variable: review_sentiment_score
Independent variable: anger, joy, sadness, disgust, fear, anticipation ,surprise, trust

Project 6

Research Area: Information Management of Professionals

Topic: Stack Overflow User Analysis

Description of Data

The dataset presents Annual Developer Survey results for the Stack Overflow Users (developer community) getting specifics about their background, skillsets, knowledgebase, technology expertise, job satisfaction/expectations and income statistics.

Research Questions

1.    Job satisfaction comes with more salary compensations/expectations?
2.    If large Multinational Corporations want to offshore their workload, which countries can be suitable in terms of logistics and workforce?
3.    Does location affect the skillsets and technical expertise?
4.    Does the number of years in a job affect the workers’ willingness to switch?
5.    To check stability in a person based on years of work, competencies, age, salary and last switch to control attrition.
6.    Do people use sites like stack overflow to find new jobs?

Target Audience for the Analysis

Analysis through the dataset selected for this project has a wide range of users from a variety of demographics who can utilize analysis results to fulfil their requirements. Listed below are the possible target audience:

1.    Companies looking for hiring and team planning.
2.    Multinational Companies planning to open offshore offices.
3.    Organizations utilizing the analysis reports to improve employee retention.
4.    Educational Organizations and researchers can use the statistical analysis for specific locations having workforce of higher or lower quality competencies, and plan for improvements accordingly.
5.    Help aid the Coders and developers looking for jobs or switch to understand the lacking competencies and salary schemas to improve and plan their career.
6.    Online platforms like stack overflow to analyze the pattern of developers and provide a solution to benefit the users/developers.