In this case study, I’ll walk you through how I revamped Myrspoven’s customer-facing app by giving a neglected diagnostic tool a major overhaul. By leveraging my UX expertise, conducting observational user research, and implementing data visualization best practices, I transformed this essential but overlooked feature into the app’s most valued asset.
Myrspoven is a pioneer in AI-driven building optimization. Their core product uses an AI model to optimize its HVAC system every 15 minutes, finding a balance between indoor comfort and energy savings. Users primarily define and adjust the climate policy for the building by setting min and max values for sensors and system components. These values act as guardrails, guiding the AI’s output.
Of course, no AI is perfect, especially right out of the box. Buildings are complex, with multiple overlapping control systems and unpredictable human factors. It's still essential for end users to monitor the system and troubleshoot any issues, whether proactively or in response to tenant complaints. Myrspoven hoped that a better user interface would increase the amount of proactive check-ins — ideally daily or a few times a week — and help to decrease the number of system malfunctions and tenant complaints.
Myrspoven wanted to:
I was brought on to design a customer-facing product to help to meet those goals.
After nearly a year and a number of fits and starts, we had a nearly finished product and were beginning usability testing. Testing revealed that there was one user need that wasn't being adequately addressed by the MVP.
A version of AI vs. baseline already existed internally and was widely used, the new interface had aimed to replace its functionality for external users with a combination of several new features. We also hoped that, since we were encouraging customer users to work differently with the building, this specific data would no longer be as important. However, user testing revealed that our priorities were off — this feature was essential for both internal and external users.
AI vs. baseline shows a graph of the “control policy” sent by Myrspoven’s AI compared to the expected policy without AI. It serves 3 key purposes:
As part of the larger app project, I conducted user interviews, then usability tests with 5 internal users.
Users were asked to run through 2 tasks: a daily checkup on a building, scanning for problems; and a diagnostic workflow where they attempted to find ways to save more energy in the building, using only the new app as much as possible. For both workflows, users wound up needing to open AI vs. baseline in the old interface in order to complete the tasks. This gave me the chance to observe how they used the current page, and lead to some key insights about the feature:
Repetition, slow loading, and layout make it challenging to scan quickly in a checkup workflow
When a specific signal or group is needed, users were searching with the browser (ctrl + f)
Supplemental data is presented poorly and hard to access
With user needs definitively demonstrated, insights gathered, and ideas sparked, I started designing the new feature. My focus was on addressing the pain points above and implementing best practices for data vis, like removing “chart junk” and eliminating mixed units and double axes.
The AI vs. baseline feature was crucial to both of the app’s most important workflows: doing a proactive checkup and diagnosing a problem. It was also an important indicator of energy savings, one of the AI product's key value propositions.
During daily checkups, users would skim AI vs. baseline to make sure that the HVAC system outputs were what was expected, and that the AI control was actually saving energy. In a diagnostic workflow, after identifying a problem with sensors (think thermostats), users would come to AI vs. Baseline to check the settings and output of the heating and cooling circuits themselves (think radiators or fans). It also helped them to determine if the problem was with the values the AI was sending or with the system itself.
The user flow below covers both workflows. This helped me to understand that, for this page, while the mindset of the user is different, the initial actions for each workflow are very similar. Since buildings are so interconnected and complex, even users coming in with a defined problem in mind would be more likely to start by looking at the whole heating system rather than an individual circuit of the system. This meant that a good ability to skim multiple graphs quickly was vital to both workflows, so I knew I could focus most of my energy there to start.
I added an unobtrusive search function and some basic filters to the top of the screen, allowing users to go directly to the system (heating, cooling, or ventilation) or individual signal they were interested in. Searching would filter the visible graphs as the user typed. I also included a date range picker that would adjust the timeframe for all visible graphs at once.
In early sketches like the ones shown here, I was still trying to make the dashboard card work as a data overview for this type of user. Here, I’m thinking through how to transition from the card to the graphs. Would users benefit from directly accessing single graphs from each row of the card? Could they use the card to build a “report” featuring only the graphs they needed? Or would these filtering tasks be better suited for another view?
Sketching helped me realize pretty quickly that we needed another view. User feedback consistently highlighted confusion with the percentages displayed on the card, suggesting that the card wasn’t providing the scannable overview they really needed. It was too simplified to be actionable. Finding concise, concrete, and actionable metrics for our portfolio and building overview dashboards was a challenge throughout the project, and one that the data scientists and engineers were working on when they had time. In the meantime, I wasn’t able to abandon the card presentation entirely, but I needed a strong entry point for this feature that offered a quick, scannable overview for technicians. That's when it hit me: what about presenting the graphs as small multiples?
As a designer and a journalism junkie, I’ve always enjoyed a good data visualization, but working on this project heightened my awareness of data visualization patterns even further. I had recently bookmarked a news story that showed a grid of scatter plots with the thought of using it for another app feature, but I realized it was also the perfect solution for AI vs. baseline. Such a presentation, called "small multiples," lets users compare trends and find patterns across multiple similarly-scaled datasets by presenting smaller graphs in a grid.
Given our tight timeline and past struggles to get useful feedback from this test group based solely on wireframes, I knew I had to move fast to create fairly high-fidelity prototypes. Luckily, I already had the components library and design system I’d created for the redesign project, along with some well-established workflows for bringing real graphs from the old app into Figma. I mocked up two versions of the interface: one with the old single-column layout and another using the small multiples pattern, where users could “drill down” into a graph that interested them.
As a designer and a journalism junkie, I’ve always enjoyed a good data visualization, but working on this project heightened my awareness of data visualization patterns even further. I had recently bookmarked a news story that showed a grid of scatter plots with the thought of using it for another app feature, but I realized it was also the perfect solution for AI vs. baseline. Such a presentation, called "small multiples," lets users compare trends and find patterns across multiple similarly-scaled datasets by presenting smaller graphs in a grid.
Given our tight timeline and past struggles to get useful feedback from this test group based solely on wireframes, I knew I had to move fast to create fairly high-fidelity prototypes. Luckily, I already had the components library and design system I’d created for the redesign project, along with some well-established workflows for bringing real graphs from the old app into Figma. I mocked up two versions of the interface: one with the old single-column layout and another using the small multiples pattern, where users could “drill down” into a graph that interested them.
I quickly tested the prototypes with some internal users, who were thrilled about the filters, and showed a slight preference for the small multiples layout. In the drilldown modal, they appreciated the bigger canvas for analysis and extra info, but they were missing one thing. The optional extra datasets — various averages, including weather — shown in the original graph provided vital context, so even an MVP of the feature would have to include them.
Even with that feedback, it was clear we had enough buy-in to start developing the feature. I refined the prototypes for the small multiples dashboard and graph modal, handed them off to the developers, and then got to work solving what I liked to call the three-axis problem.
Managing buildings involves dealing with a lot of data, even with automated systems like Myrspoven’s. A single graph can show a technician that something is wrong, but usually not why. Was that sensor was running cold because it was cold outside? Or was it a problem with the AI? Are there related issues elsewhere in the building? Issues can range from a broken part to a miscommunication between building systems to something as simple a tenant opening a window. Context is crucial to solve the problems, and to figure out whether an anomalous reading even is a problem in the first place.
The internal version of AI vs. baseline tried to solve this by cramming all the datasets a user might need on a single graph. This quickly got messy — different data all used different units and different scales, leading to a graph that had effectively 3 different y-axes (or 3 units across 2 axes on 1 graph).
In the new version, I wanted users to be able to pinpoint exactly what needed attention and what to do about it, whether it meant making tweaks online or a hands-on repair onsite, while keeping the screen-toggling to a minimum.
To accomplish this, I broke each dataset into an individual graph. The graphs stack vertically, sharing a common time-based x axis. Hovering over the graph creates tooltips at that point in time on all visible graphs. This way, as a user slides their cursor along the graph, they can get a full view of everything that was affecting a signal's value at any given moment. Users can toggle data sources on and off, and drag to change the heights of each graph if needed.
The first iteration of the graph included a list of supplemental information in the sidebar. I realized that, by combining the individual averages with the name and actions from the chip on a thin card, I could reduce repetition and puts the statistics in context, tying information more closely to action.
In some early sketches, I had included a quick edit box for the min and max of the circuit. Seems like an obvious win, right? We want to provide actionable insights. The problem is that when users tinker directly with these values, it overly constrains the AI, limiting its ability to save energy. I discussed this with our product owner and we decided that, in this case, a little friction would improve the experience. Instead of making the values directly editable from this page, I displayed them as text and added a link to click through to another view where they can be edited. Hopefully, the little bit of extra friction will discourage the tinkerers, but make it a bit easier for changes to be made when they’re really, truly necessary.
While this feature was being designed and developed, I was also running a usability test with end users. We didn’t directly test this feature with end-users, but hearing users describe how they use the data we provided, and observing how they did so, was essential, and definitely shaped my approach to designing this feature. I was looking forward to exploring the insights below in the next version of the feature and app.
Users almost never looked at just one trend line. Helping them find the right groups of contextual information faster will improve their experience even more.
How might we help users visually distinguish between the sensor they’re analyzing and the others they’ve added for context?
Implementing features to help users better understand the relationship between circuits and sensors should be a high priority.
How might we provide even more contextual data, like min and max, on the graphs without making them cluttered and overwhelming?
I was also able to apply some of the layouts and components in other features of the app. Sensor Trends got a similar graph stack (shown below) and I even sketched a few other use cases for the small multiples pattern.
I planned to do follow-up usability testing for the next batch of features, including AI vs. baseline. I also set up analytics and began tracking a few metrics for usability and AI trust via survey — SUS (system usability scale) and TOAST (trust in automated systems test), respectively — to see if the interface improvements had been successful. Unfortunately, due to unforseen circumstances, I wasn't able to see the project through to launch, leaving in November 2023. The app launched to users in February 2024.
Even without the final results, based on the research I did complete and the best practices I put into place, I’m confident that this redesigned feature has significantly enhanced building management technicians’ daily experience in managing AI-enabled buildings. Feedback after launch called out the improved AI vs. baseline feature as “especially appreciated.”