Welcome to my Data Science page!

Check out some of my Data Science projects

Four-Group Data Analysis & Visualization

Problem:

In our lab, we collected a variety of behavioral data from experimental subjects across four distinct groups. The challenge was that this data required extensive reformatting before it can be transferred to statistical analysis software for group-by-group testing and graphing. This manual reformatting process was not only time-consuming but also susceptible to human error. Moreover, the statistical software used generated basic and aesthetically unappealing graphs, which limited the quality of data visualization.

Solution:

To overcome the inefficiencies and error-prone nature of manual data reformatting, a PySimpleGUI desktop application was created, tailored for processing behavioral data from uniformly formatted Excel datasheets. The application boasts a user-friendly graphical interface, enabling lab members to quickly extract relevant data, conduct group-by-group analyses, and generate comparative graphs with minimal effort. This solution allows users to perform these tasks with a few clicks, delivering analyzed results and high-quality visualizations of group differences within seconds, thereby streamlining the workflow and improving the quality of data representation.

PySimpleGUI

PySimpleGIU img

Graphs generated with SEM & ANOVA tables

Graph and table img

Short video demonstrating the desktop application's functionality

Least Different Group Matching Analysis & Visualization

Problem:

In experiments involving pair-housed subjects that undergo an initial baseline test and based on this baseline data are assigned to subsequent groups, the challenge lies in assigning them to two groups (control vs treatment) with minimal initial differences between groups. However, doing so ensures that any biases from previous conditions do not influence the outcomes of subsequent experiments. Moreover, it's crucial to maintain cage mates within the same group to prevent potential cross-group effects—if one subject experiences stress due to its group's treatment, it could impact the results of its paired partner in future tests. Compounding this challenge of identifying the two groups that exhibit the absolute least differences, is the fact that the task could potentially involve analyzing and comparing hundreds, if not thousands, of group assignment possibilities. Such extensive analysis would be impractical for a human to perform manually.

Solution:

A PySimpleGUI desktop application was developed to analyze individual data from cage mate pairs stored in an Excel file. The application systematically evaluates every possible combination of group assignments to identify the combination where the data points exhibit the least overall difference while preserving their pairing. This analysis includes calculating and recording various metrics in an Excel output file, such as:
  1. All possible group combinations considered
  2. The means of all possible group combinations
  3. The absolute differences between data points across all group combinations
  4. The specific group assignments resulting in the smallest mean difference between groups
The program also displays a graphical representation of the group assignments using Matplotlib. The line plot depicts the data points of the least different group assignments, while also maintaining adherence to cage mate pairing. These lines are plotted to visually illustrate the minimal possible distance between data points. The application’s output provides a comprehensive overview of the analysis steps and results, facilitating informed decision-making for optimal group assignment in experimental setups.

PySimpleGUI

PySimpleGIU img

Graph generated showing hypothetical two-group assignments that are the least different between each other out of all possible group assignments

Line graph img

Short video demonstrating the desktop application's functionality

Robinhood Stock Data Analysis & Visualization

Problem:

The challenge was to develop a systematic approach for evaluating the performance of a portfolio of individual stocks over time.

Solution:

To address the challenge, a Python script was developed using the Robin_Stocks library and the Robinhood API to record daily stock information into a CSV file. The data is then analyzed and visualized with the Pandas and Matplotlib libraries. The script aggregates the performance data of individual stock selections and compares it against the aggregate performance of two diversified ETFs. This comparison is visualized through graphs that illustrate total return and other key performance metrics, enabling easy evaluation and benchmarking of the portfolio’s performance over time.

Script created using the Robin_Stocks Python library and the Robinhood API

Stock analysis graphs img

Wheel Running Data Analysis & Visualization

Problem:

In studies investigating the effects of exercise using a running wheel, our lab used a program that tracked total revolutions every minute. This data was logged into an Excel worksheet each minute over a 24-hour cycle creating one sheet per subject for each 24-hour cycle. For many subjects over extended periods, this generates millions of data points. The analytical challenge included counting only valid running bouts (≥3 revolutions per minute) and distinguishing running behaviors between the light (inactive) and dark (active) phases of the subject’s light cycle. The goal was to efficiently process and analyze this extensive dataset to quantify various aspects of running behavior while considering phase-specific activity patterns, providing meaningful insights into the effects of exercise.

Solution:

To address the data management challenges posed by these running wheel experiments, a user-friendly graphical interface program was developed. This program compiles and organizes the raw data from Excel files into a ‘Combined Data’ sheet alongside the original data sheets, categorizing running behaviors and distinguishing them by the subjects’ inactive (light) and active (dark) phases. Automating data calculation and aggregation significantly reduced the manual workload and minimizes errors, accomplishing in seconds what would otherwise have taken hours. Additionally, the program features a bar graph visualization tool, providing users with a quick and informative visual analysis of the dataset.

PySimpleGUI

PySimpleGIU img

Graph generated showing mean values of different wheel running behaviors during the animal's active 12hr light-cycle versus their inactive 12hr light-cycle.

Running wheel behavior graph img

Short video demonstrating the desktop application's functionality

Financial Data Analysis & Visualization

Problem:

Credit card companies and banks typically offer basic charts that group spending into broad categories. While useful for a general overview, these charts fail to monitor spending trends over time and frequently miscategorized transactions, leading to inaccuracies. These inaccuracies can mislead users, affecting their ability to make informed financial decisions and plan budgets effectively. A more personalized and precise system that accurately categorizes spending and tracks detailed spending patterns over time, provides more reliable data analysis to support financial planning.

Solution:

To address the limitations of basic spending charts, a Python script was developed to offer a personalized and accurate view of spending data over time. This script processes CSV files from bank and credit card statements using the Pandas library, categorizing and analyzing the data in detail. It then generates visualizations with Matplotlib, such as stacked bar graphs, to present monthly spending across specific categories. This approach provides clear, detailed insights into financial activity, facilitating better tracking and analysis of expenditures.

Checking account activity divided into categories by month

checking account data graph

Credit Card expenditures divided into categories by month

credit card data graph

Activity Text File Data Extraction to Excel & Visualization

Problem:

In our lab, we quantified and analyzed the locomotion activity of experimental test subjects using an activity tracking chamber. This chamber records the subjects’ movements via infrared beam breaks and the software outputs comprehensive data in a .txt file across various categories. However, our analysis required only specific portions of this data. The large amount of superfluous data in text format made locating extracting, and transferring the relevant data into a formatted Excel spreadsheet for statistical analysis is a manual, labor-intensive task that consumed significant time. The process was not only inefficient, but prone to errors, especially with large datasets from multiple subjects over extended experimental trials.

Solution:

To address the inefficiencies in handling locomotion activity data, a desktop application was developed with a user-friendly interface that automates the data processing workflow. The application transforms the raw .txt file into a standardized format, selectively extracts the relevant data categories required for analysis, and transfers this data into a properly formatted Excel spreadsheet. This automation reduces the processing time from hours to seconds and eliminates the risk of human errors associated with manual data handling. In addition to data handling, the program allows the user to output a bar graph of activity data over time for rapid analysis and anomaly detection.

PySimpleGUI

PySimpleGIU img

Graph generated showing mean values of all subjects locomotion for each 5min time period over the 1hr trial period along with SEM of each time bin. Graph is charactoristic of the inital high exploritory behavior of a novel environment that tapers off over time.

Activity bar graph img

Short video demonstrating the desktop application's functionality

Click icon to return

to profile main page

home profile logo