Team members
Pearl Park, Ayesha Halim, Nazanin Mehregan, Dorothy Lee, Olivia Langhorne
Project summary
CrowdGuard: Your dedicated community guard, designed to predict and prevent crowd-related disasters.
Keywords
Clustering, Crowd disaster, Crowd Safety, Machine Learning, CNN, unsupervised clustering, multi-object tracking
Inspiration
Imagine attending a joyful celebration, only for the night to end in unimaginable tragedy. In 2022, during Halloween celebrations in Itaewon, Seoul, people flooded the narrow alleys, and what was meant to be a festival turned into a nightmare. The Itaewon crowd crush incident stands as a devastating reminder of how quickly a situation can spiral out of control when dense crowds gather in confined spaces. Over 150 lives were lost due to compressive asphyxia.
The heartbreak extends beyond the immediate tragedy; survivors were further victimized by media narratives that blamed the crowd for its gathering, overlooking the critical failure in planning and crowd management. Such incidents are not isolated—they occur worldwide during major events, from religious assemblies to concerts and sporting events.
Over the past 24 years, more than 8,000 people have lost their lives in crowd-related incidents across 52 countries. These disasters occur when densely packed crowds reach a critical point where movement is no longer controlled, leading to injuries or fatalities due to compressive asphyxia or trampling. Despite the chaotic nature, they reveal a common thread: with better planning and faster response, many of these deaths could have been prevented.
In other words, with the right technology, governments can predict and prevent such disasters. However, technological advancements alone are not enough. Effective cooperation between law enforcement, emergency services, and event organizers is crucial for managing the aftermath of such incidents. This collaboration ensures that casualties are swiftly transported and treated, significantly improving the chances of survival and reducing the impact of emergencies.
CrowdGuard tackles the critical issue of crowd safety with a cutting-edge approach that leverages machine learning to analyze real-time video footage. By predicting the likelihood of dangerous crowd densities before they escalate into disasters, CrowdGuard empowers authorities to act swiftly and effectively, minimizing harm and saving lives.
Our project is more than just a tool for preventing large-scale tragedies; it’s about fostering smarter, safer environments for everyone, everywhere. Although this approach may initially appear intrusive, CrowdGuard alleviates ethical concerns through the use of blurring techniques. Its primary objective is not to identify you but to safeguard your well-being. With advanced machine learning algorithms, CrowdGuard moves beyond reactive measures, offering a proactive solution that anticipates and prevents potential dangers before they become alarming.
CrowdGuard is committed to transforming crowd safety, promising a future where crowd-related disasters are significantly mitigated or even entirely prevented. Your safety and the efficiency of emergency responses are at the core of our mission, ensuring that every gathering and event can be enjoyed without fear of disaster.
List of technologies
Front end: React.js with HTML and CSS for the website layout and Material UI for the icons and components. We mocked up a prototype using Figma to include features that we want to incorporate in the future.
Pipeline: CNN using YOLO V8, unsupervised clustering with DBSCAN, multi-object tracking with ByteTrack
Back end: Flask for API handling, Flask-SocketIO for real-time communication, and PyTorch for running models like YOLOv8 and CSRNet. OpenCV processes video frames, Flask-CORS manages cross-origin requests, and base64 encoding facilitates image data transfer.
Project development
What it does
CrowdGuard is a web application specifically designed for government agencies or event organizers to prevent the occurrence of any potential crowd disasters. The main purpose of this application is to take in real-time surveillance camera footage and predict the potential of a crowd disaster occurring based on a scale from 0-2 (where 0 is low risk and 2 is high risk). If the score is high, the application alerts its user to take action and prevent any disasters from occurring. It takes in real-time surveillance camera footage and calculates the potential of a crowd disaster based on the predicted crowd density and flow rate.
How does it work
Our approach takes in individual video frames and performs human detection using an efficient off-the-shelf model called YOLO V8 that detects and encloses objects around boundary boxes. Non-maximum suppression prevents double detection of the same object. After filtering out the boundary boxes for humans only, Byte track tracks these people across time for speed calculations (in m/s). Because surveillance camera footage is typically taken at an angled view, the mid-bottom point of each boundary box is extracted and displayed in a 2D birds-eye-view through perspective transformation. A 2D orientation of the people allows for a better estimate of the distance between each person, which can then be converted from pixels to cms. Using an unsupervised clustering algorithm called DBSCAN, we then cluster different groups of people, into lone pedestrians or groups of people, based on their relative distances to one another. Finally, the pipeline calculates the crowd density for every frame and speed every 2 seconds within each subgroup to display the likelihood of a crowd disaster from 0-2.
Datasets we used
We explored several datasets for our project, including the ShanghaiTech crowd counting dataset, UCF crowd dataset, and SOMPT22. These datasets provided images or videos of crowded areas, along with data such as bounding boxes around individuals and estimated crowd sizes. We tested our YOLOV8 model on these datasets and achieved good accuracy in drawing and blurring bounding boxes around people in the footage. However, a key limitation was the lack of information on the probability of a crowd disaster occurring, as well as missing data on the size of the spaces depicted. To address this, we simulated controlled environments with a known number of people in specific-sized rooms or corridors to better predict crowd crush likelihood. We then extracted videos from our AnyLogic simulation and tested our model with this footage.
Impact/Innovation
Stemming from our concerns over the safety of people in spaces that are prone to crowd disasters, we wanted to develop an application that would enable authorities to respond faster to emergencies and mitigate the potentiality of crowd crushes from occurring. This, in turn, leads to a safer, happier, and more trustful society that is able to place their faith in the system to look out for them.
With creating software that would regularly scan crowds of people, there was the concern of using and storing people’s images without their knowledge or consent. To accommodate for this, the boundary boxes that would identify a singular body within a crowd would be blurred before the video frames got processed. Additionally, we opted to use simulation to recreate potential crowd crush scenarios rather than actual crowd crush footage due to the ethical concerns of using videos of persons in distress, and out of respect for the person who have passed during such events.
One potential drawback of our application is that, on account of a lack of a scaling factor, we approximated the average heights of the boundary boxes formed around individuals in the frames to be scaled up to the global average height of a person, at 168.5 cm. However, this height standard could vary based on the race, sex, and socioeconomic status, and therefore lead to biased prediction of crowd density. Along those lines, it is possible that the YOLOV8 algorithm employed might detect people of certain ethnic groups better than others. With time, these can be combated with more rigorous research and a more diverse training set.
Challenges we ran into & how we overcame them
Lack of training dataset
To train a model like CrowdGuard, we’d require footage of a high foot traffic area. The challenges encountered with this were due to both the quality and quantity of footage we were able to find. To train CrowdGuard, picture quality had to be high enough so that CrowdGuard could identify a person within frame. The camera also needed to be situated at a certain angle that accurately identified a person and their distance from another individual in frame, since this would affect the accuracy of crowd density calculations. Regarding quantity, there was a finite amount of footage we had access to, not allowing for robust training of the model.
To remedy the lack of training dataset to test our model, we opted to use AnyLogic to simulate large crowds moving through a corridor. The “camera angle” that the simulated crowd was being captured at meant that a body could be clearly identified by the model’s boundary box. The simulation was also several times, with factors like crowd population and camera angle being altered, in order to train the model with a large simulated data set.
What we learned & accomplishments we’re proud of
Throughout this project, we were able to take what we learned about machine learning models during the training weeks and choose/apply the ones that best fit the purpose of our pipeline. Despite time constraints, we are proud of having created a minimum viable product of our application that displayed the essential components of the application and the pipeline we wanted to highlight.
What’s next for the project
While CrowdGuard has demonstrated considerable success as a prototype, several key areas present opportunities for improvement:
- Multi-Source Video Analysis
Currently, CrowdGuard analyzes video feeds from a single source. Enhancing the system to ingest and process video data from multiple locations simultaneously would provide a more comprehensive understanding of crowd dynamics across various regions, thereby increasing situational awareness. - Data Persistency
CrowdGuard currently records data temporarily, posing challenges for long-term analysis. Transitioning to a robust, persistent database solution is crucial, as it would enable comprehensive analysis over time, enhancing data reliability and informing decision-making for authorities. - Geospatial Mapping & Emergency Contact Integration
Developing geospatial mapping capabilities would enhance visualization of crowd dynamics across different locations. This feature will help identify hotspots of activity and potential risks, facilitating better resource allocation. Additionally, ensuring quick access to relevant emergency services will enhance responsiveness not only during turbulent situations but also for efficient crowd control when congestion begins to occur. - Crowd Sentiment & Behavioral Analysis
Integrating sentiment and behavioral analysis into the predictive model represents a significant advancement. By considering group emotions alongside speed and flow rate, the system can provide deeper insights into crowd dynamics and potential risks.
Acknowledgements & References
This project would not have been possible without the wonderful resources, informative lectures and mentorship that the AI4Good Lab provided throughout the 2 months of the program! We thank all the coordinators, speakers and lecturers for supporting us throughout this journey. And finally, a special shoutout to our mentor, Jacob Tian, and TA, Yimei Yang, for giving us great and helpful advice and insights into making our ideas into a real application!