Creating an Interactive Data Visualization Tool for California’s Water Data Challenge

HydroDetectus
5 min readMar 10, 2021

Written by : Jonathan Santoso, student member of HydroDetectus, and Data Science Institute scholar @ Columbia University, NYC, USA.

Advised & Edited by : Dr. Jose Gustavo S Paiva, Visualization team leader of HydroDetectus, and professor of Faculty of Computing at the Federal University of Uberlandia, Uberlandia-MG Brazil

Curtesy : Dr. Jose Gustavo of HydroDetectus

Developing effective visual analysis strategies can be challenging. In California’s Water Data Challenge, we are developing a data visualization tool that provides a set of interactive layouts conveying information about several water indicators, provided by historical data and also by the prediction of several Machine Learning models. The idea is to provide an environment that allows water managers to comprehend the water scenario in California, and have insights of how this scenario will probably evolve over time, as well for California’s Water Data Challenge. In this article, we present an overview of our strategy, and main steps of this system development.

Step 1: Know your Users

One of the key principles of Data Visualization is knowing the users of your product. Having this knowledge will allow the creation of effective yet intuitive layouts. It is crucial to identify 1) what and how they do their analysis with the available data; 2) what they are and are not able to do with the current analysis tools, and 3) what are the main questions they want to answer regarding the scenario under study. However, although these key points are well defined and perhaps even straightforward, they are definitely not trivial to perform, specially because users are not always aware of what they really want to analyse. When designing visual analytic systems, designers must think about tools that, in addition to answering known users’ questions, are also able to provide an exploratory analysis which reveals to users strategic information they were not aware of. This type of analysis definitely enhances their experience, and at the same time brings important unknown information about the scenario. We therefore intend to perform several interviews with our potential users, in order to identify strengths and limitations in their current analyses, as well as to gather requirements and key metrics which will make the tool useful to decision makers. In addition, we will present to them initial versions of the analysis system which provide simple analysis tools, in order to foster their analysis and to raise new questions, which in turn will lead to the development of additional analysis tools, as well as the adjustment of the already developed ones. This incremental learning/development process helps to develop a system which is strongly connected to the users needs.

Step 2: What data is available? What to display?

Based on initial user requirements, we search for publicly available data. This is also a very challenging task. Government and water agencies provide a lot of information regarding water behavior, including structural relationship and also information evolution over time. However, all these data is spread over several data sets. It is thus necessary to organize and process all the information in these repositories, in order to build a data set which contains all relevant information to our users. Additional data sources, not related to water data, may also be necessary to complement the layouts.

Step 3: Use the right charts / Provide useful interactions

Once we obtain the proper data set for visualization, it is time to choose what type of visualization technique to use. In this project, an intuitive way of visualizing data for our users is a geographical map, because they are used to work with locations and their relation to each other. We then intend to build our tools using overlapping layers on top of a CA geographical map, in order to allow the combination of several data aspects, and also to relate them accordingly, revealing behavior patterns that may help to extract interesting information. The interaction with visual cues on the map triggers several distinct analyses, keeping the main geographical context always on top.

We intend to show two distinct data aspects. The first one is the structural aspect, which provides an effective view of the phenomena structure, in terms of the relationship among its components, represented by several indicators such as river water flow, air temperature, precipitation and soil humidity, among others. It allows the comprehension of several types of correlation among these indicators, how they relate to their locations, and how different locations relate to each other regarding these indicators. The second aspect is the temporal one, which depicts the evolution and changes of these indicators over time to highlight behavior patterns representing seasonal and/or abnormal events, as well as to suggest future behaviors.

We also propose to integrate the outputs of several prediction models with these layouts, aiming to provide an effective comprehension of these predictions, to allow justify models behavior, as well as to guide users in choosing the adequate model configuration for a specific scenario. To connect the map with predicted stream flow data, selecting a particular or multiple elements on the layout may trigger, for example, a line chart to display streamflow predictions per day and a bar chart to display stream flow aggregation per year. Line and Bar charts are interesting approaches to display information over time. Lastly, a heatmap showing value differences between actual and prediction may help identify outliers in predictions, as well as significant streamflow variations over time.

Together with all visualization techniques, we will employ a set of interaction tools to provide the users the ability to refine the layouts to show what they really need/want to see. These tools implement basic actions, such as selection, pan and zoom, aggregation operations, such as combining days into weeks, and weeks into months, clicking operations, for triggering layout generation processes focused on specific elements in the layout, and also hovering operations, for displaying additional information about a layout element.

Step 4: Pick the right tools

To develop this web visualization tool, React is an easy choice, and one of the most popular libraries for creating UI, allowing developers to create fast, scalable, and simple single page applications. With the use of React Hooks, we believe keeping track of current application state will be simple.

To create the custom interactive visualization on the web page, D3 is the tool of choice as it provides developers the capabilities to control every aspect of user interaction.

Finally, Heroku was chosen to host the first versions of the system in the cloud. It offers support for building, running, and operating applications, and in our initial tests it proved to be satisfactory for providing the initial system versions to the users. We intend to work on more robust options as the system evolves in terms of tools and number of users.

Final words

Continuous feedback from the HydroDetectus team has helped to improve upon the current visual strategies employed. We are now preparing the initial version of the system to present to the users, to collect our first feedback.

--

--