Palmetto Park's home-grown CRM, built by multiple freelancers and distributed across multiple disparate microservices, suffered from frequent failures and data integrity issues. The system's complexity and lack of observability caused critical features like the property alerts to be unreliable, risking missed business opportunities and escalating maintenance costs.
Drew had hired dozens of freelancers over the course of five years to build a home-grown CRM for his niche, yet sizable, real estate business on the east coast, Palmetto Park Realty. Over time, Palmetto Park Realty ended up with a system that had been "duct taped" together by many developers with varying levels of skill, experience, and no big picture vision for the software or integrated components.
As the system grew over time, things would fail in unexpected and very difficult to diagnose ways. For example, the property alerts system would sometimes work, and sometimes not work, and it was hard to figure out why it wasn’t working. Drew’s team could miss out on great closing opportunities due to the system not working, and the buggy functionality was getting costly. He also had data quality and integrity issues, and the old CRM was tied to Angular V1 and an old version of MongoDB, and upgrades were going to be very difficult without upgrading Angular, which would have been a large project on its own.
Drew needed to:
Drew’s systems suffered from a “microservices too early/without the actual scaling needs” type of problem. There were several independent systems communicating in a number of different ways: webhooks, one system doing a cron to populate data that another system would be on a cron to read from and check for changes, and it was a nightmare to debug. Each of these systems was also doing a relatively straightforward/simple thing, but being run on different VMs and servers, and having no observability as to if the server was up or not without someone manually SSHing into the servers to see that they were up and things running as expected.
Given all of this, Elyon proposed migrating existing critical systems to a modern Django server and stack that:
We decided to go for it, and start with the alerts system. We ran everything on one server, and coordinated the parts of the Django stack with a simple docker compose setup. We had the web server, Celery beat, and the Celery workers all running on the same server with the same configuration, and set up Sentry to work with all of them, including the logging integration so that it was trivial to send warning and error logs to Sentry any time they showed up.
From there, we actually refactored an existing alerts piece of code that was one of the Python microservices, and moved it into a Celery Beat scheduled Celery task. We also wrote very thorough unit tests to check the work of the existing Python code that had already been written for the alerts, found a number of bugs, and found a big long term issue with unbounded data growth and lag that was likely contributing to the degradation of performance Drew had seen over time. From there, we tied everything together, and got this task firing every 15 minutes as requested.
Right away, Drew got to see high email send numbers on his Mailgun dashboard that he hadn’t seen in a long time, indicating alerts were firing properly and hitting users as expected. Beyond that, we had all of this running in Celery beat with Django Celery Results, so it was very easy to see every time jobs had scheduled to run, and what the result of the job/task was (success/failure), and failures were sent to Sentry automatically, which would deliver a real-time notification that Drew or me could immediately respond to. Sentry surfaced another issue or two shortly thereafter, and we were able to patch the code, and write tests to reproduce the issue
Following the success of alerts, we continued on as follows:
After a number of things had been built from above, we tackled a larger challenge of rebuilding the Agent lead matching system and Slack bot, which had a number of complex pieces of logic, but was critical for Drew’s business.
From there, we continued building new functionality and migrating old functionality until the old system was shut off.
Drew had a highly trafficked and very built out website. And, while certain things were broken or intermittently failing, it still worked pretty well for the most part. It was critical to keep it working as it already was while we built this new system.
Sometimes, things like this are done by very careful refactors to an existing system over time, without having to introduce new systems.
However, in our case, introducing a new system was a huge win, because it came with a new environment that brought stability, testability, monitoring, and overall robustness, serving as a foundation for future business ventures. At the same time, when it was needed, we were able to do write records to both the old MongoDB and the new PostgreSQL (so the old Node API would still return proper results in certain cases). With our new system, we could easily write robust tests to assert that we were inserting, updating, and deleting records appropriately so that existing functionality would not be impacted, and had monitoring for anything unexpected that would come up. Every step of the way, we built the new world while still supporting the old one, until the old one was no longer needed.
Elyon Technologies spent time caring about the actual deliverable rather than jumping into projects without a strategy. We spent time on planning as well as thinking through alternatives.