We live in a world in which data is key to our decisions, actions, and future outcomes, which makes it essential to understand. Data can be subjective based upon many variables that play a part in our understanding and decision-making on the difference between good data and bad data. It is important to understand the differences between types of data and how they can be utilized in the stormwater industry, so let us dig in.
Estimated and collected data
Estimated data is an inference from incomplete data. Estimated data can be as simple as a guess from past experiences to using complex software tools to determine best-inferred unknowns. However, it is still an estimate. Having missing data points and using historical data to estimate that missing data should be taken with a grain of salt.
Collected data is data that has been physically collected from a location or device. This can be from a sensor, meter, probes, etc. but typically is done with a mechanical device or through physical collection of the material for analysis. These can be as complex as a SCADA network or as simple as a tape measure or grab sample.
When comparing the accuracy of these data types, collected data will usually have higher accuracy vs. estimated data, however, estimated data can be established as good if collected data matches and verifies the estimate.
Simulated and modeled data
Another form of data can be from simulated and modeled data. Simulated data can range from a physical model of the datum such as a stream, culvert, dam, etc. with actual physical scaled duplication of the site to data replication using large amounts of data to mimic conditions of the real world.
Modeled data takes electronic data and compiles it into a computer or mathematical model. For stormwater this is typically done for drainage basins, floodplain analysis, dams, culverts, etc. using dozens of data points to tens of thousands or possibly millions of data points to come to a range of outcomes.
Historical data
Historical data is data collected previously for specific reasons. For stormwater, this is typically data that was collected prior to the current quarter of the year and might go back in time several or even dozens of years. Examples of national historical data can be found at the United States Global Service (USGS) or Nationwide Operational Assessment of Hazards (NOAH).
The importance of historical data cannot be expressed enough. Knowing what happened yesterday and any changes since then can be applied to what is happening today. Understanding how the stormwater network was developed over time, how and where ponds, culverts, etc. were sized helps to understand the previous actions and provides the data used for these decisions.
The accuracy of the historical data does need to be confirmed prior to any new events if the data is to be used for decision-making. It is also important to consider the storm data that was used to develop any existing structures to determine if redesign or replacement is needed based on current storm data and future projections.
Calibration data
Calibration of data typically refers to matching the same situations repeatedly to get the same results, or the process of adjusting parameters to confirm the model will output accurate data, based on input parameters, and matches empirical data as closely as possible.
Calibration data can take several forms using calibration solutions to readjust instruments to take additional field measurements to add more data points. Calibration ensures that the data being used collates as closely to the real world as possible. This can be a simple process for mechanical calibration of equipment or could require hundreds of additional data points to check datum against.
Calibration of data points can be challenging as identical simulations or situations are needed to properly compare data points to one another. An example of this would be having two five-year storms occur in the same drainage basin, with the same rainfall in the same area, timeframe and intensity, meaning that the hydrographs for each storm would have to match for calibration. If the two storms have different intensities and timeframes or start in different locations of the drainage basin, they are not suitable for calibration against each other. However, the additional data points would create additional observations and estimates based upon the two storms and could be used with additional data for verification and validation.
Calibration of any data or field instrument is vital to ensure data accuracy. Without the calibration of equipment, the data collected will not be accurate and decisions made using the data will suffer from this inaccuracy. Likewise, uncalibrated data can mislead decision-makers and cost money to correct later.
Potential pitfalls of data
The drive towards automation and advancement in low-cost sensor technology has many organizations overwhelmed with data and not knowing where to start. More data has been collected in the last several years than in all previous human history. This is one of the hardest parts of data - knowing where to start and how to proceed.Some basic rules of thumb are to understand the overall picture of the system of data you have collected. Is it a watershed? A distribution system? What is it, primarily? Then try and choose a logical starting point from your understanding of the objective. If it is a watershed, start at a sample location furthest away and work across the watershed. If it is a piping network, start at a dead-end pipe and proceed to the plant or discharge. Watershed delineation may involve first starting with topographic maps to try and determine the watershed boundaries. These can then be divided into subbasins as runoff characteristics are identified and applied to the data. Several iterations may be needed, and it is OK to start over, the knowledge gained from previous attempts will help speed up the process.
Some questions you should ask when analyzing your data:
- Is the data time-dependent, such as a storm gauge?
- Is it a static or dynamic reading?
- Is it recurring data?
- Can recurring data be parsed into smaller pieces?
- What were the field and weather conditions?
- Which areas need the most/least attention?
- Are you dealing with historical data vs. current data?
No matter how much data you have or how it was collected, the human element (humint) cannot be removed from the equation. Software can accurately simulate and even forecast the future, but it is limited by the controls or alerts imposed by the user. Software can give users alerts for numbers outside of a specific range, however what about when a number stays within that tolerance and is a problem outside of the set tolerances? It can be easy to picture this using pH or other parameters, but for stormwater, it is really about the control structures where humint meets the proverbial road.
For example, a culvert that was sized and installed in the 1980s, while entirely accurate for that period, may now be undersized based on current weather conditions. In this case, the software may find this problem, however, it is just as likely that the new tailwater conditions will be outside the purview of the software. Meaning the tailwater conditions are calculated but where the tailwater goes is outside the software conditions. In this example, a human would understand the tailwater conditions and can act accordingly to offset any potential damage.
Understanding the differences, pitfalls and potential of data is critical to ensuring communities are protected from big weather events. So much time and money are spent on predicting rainfall, but equally or perhaps more importantly, should be the allocation of resources to predicting where the runoff will go. Tools are only as good as the humans that oversee them so the more you know about your data, the better the analytics will be.