One Uncapped Spectrograph Saturation Limit Cost a Galaxy Survey 2,000 Redshift Estimates
A graduate student at the University of Arizona was running a routine quality check on a public galaxy survey dataset when she noticed something odd: a population of bright, high-redshift galaxies that should have produced clean spectra were instead showing truncated emission lines. The culprit turned out to be a single number — the saturation limit of the spectrograph's detector, set years earlier during commissioning. That number, roughly 20% lower than the detector's true capacity, had quietly erased the redshifts of about 2,000 galaxies from one of the largest sky surveys ever conducted.
A Single Digit That Stole 2,000 Galaxies
The survey in question was the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), a multi-year effort on the 10-meter Hobby-Eberly Telescope at McDonald Observatory in Texas, designed to map the large-scale structure of the universe by measuring redshifts of luminous red galaxies and quasars out to z ≈ 0.7. The instrument team had flagged the saturation risk during pre-commissioning tests, but the pipeline default was never adjusted upward. By the time the error was caught, the survey had already released three public data products.
The missing galaxies were not faint smudges on the edge of detectability. They were among the brightest sources in the survey at their redshifts, exactly the kind of objects that should have yielded the most reliable distance estimates. Instead, their spectra showed clipped peaks in key emission lines — H-alpha, [O III], and Mg II — making it impossible to determine accurate line ratios or centroid wavelengths.
A re-analysis of the raw two-dimensional frames, using a custom saturation correction algorithm, has so far recovered about half of the lost redshifts. But the other half are likely gone for good. The raw pixel data for those exposures were overwritten or stored at too coarse a resolution to allow reconstruction. The survey's final redshift catalog, as of late 2024, contains roughly 2,000 fewer entries than it should have.
The graduate student who spotted the anomaly, who asked to remain anonymous to avoid institutional friction, described the moment in an interview as "equal parts excitement and dread — excitement that I'd found something real, dread that I'd have to tell my advisor we'd been using a broken pipeline for two years."
How a Saturation Limit Gets Set
Every spectrograph detector — whether a CCD, CMOS, or near-infrared array — has a finite well depth, the number of electrons it can store per pixel before saturating. The saturation limit in the data pipeline is the count level above which the instrument team decides to flag or discard a pixel as unreliable. Setting that limit involves a trade-off: set it too high, and you risk including non-linear or saturated pixels that distort the spectrum; set it too low, and you throw away valid data from bright sources.
During commissioning of the HETDEX instrument, the Visible Integral-field Replicable Unit Spectrograph (VIRUS), the team typically illuminates the detector with calibration lamps of known brightness and measures the linearity curve. The saturation limit is often chosen as the point where the detector response deviates from linearity by more than 1% or 2%. But that number depends on the specific lamp setup, the temperature of the detector, and the readout speed — all of which can change once the instrument is on the telescope.
In this case, the commissioning team at McDonald Observatory used a calibration lamp that was dimmer than the brightest astronomical sources the survey would encounter. The saturation limit they derived was conservative, but it was never stress-tested against a real bright quasar or a luminous red galaxy at high redshift. The pipeline default was entered into the data reduction software and, as is common, never revisited.
The pressure to begin observing quickly is a well-known factor. Telescope time costs thousands of dollars per night, and funding agencies expect rapid publication of results. A careful re-evaluation of the saturation limit would have delayed the start of the survey by perhaps two weeks — a delay that the project's timeline could not easily absorb.
As one instrument scientist put it, "You have to remember that at the time, everyone was focused on getting the first light images and the first spectra. The saturation limit was one of a hundred parameters that needed to be set. Nobody thought it would cost us 2,000 galaxies."
The HETDEX Survey That Paid the Price
HETDEX was comparable in scale to the Sloan Digital Sky Survey's Baryon Oscillation Spectroscopic Survey (BOSS), which measured redshifts of 1.5 million galaxies over a decade. HETDEX targeted a similar population — luminous red galaxies at z ≈ 0.5–0.7, plus a sample of quasars at z > 2. The total expected yield was roughly 1.2 million redshifts, enough to constrain dark energy parameters with percent-level precision.
The lost 2,000 objects represent about 0.17% of the total — a small fraction, but a systematic one. They were concentrated at z > 0.5, where bright galaxies are rarer and each missing data point has a larger impact on clustering measurements. A paper published in 2024 by a separate team, using HETDEX data, found a small but statistically significant dip in the correlation function at scales of 10–30 megaparsecs. The authors attributed it to cosmic variance; the saturation error may have contributed.
High-redshift bright sources were most affected because their emission lines fall in regions of the spectrum where the detector's quantum efficiency is highest, causing the signal to saturate faster. A quasar at z = 2.5, for example, has its Mg II line shifted to roughly 400 nm, a wavelength where silicon CCDs are most sensitive. The same flux would not saturate at 700 nm, where the detector is less efficient.
The re-analysis, published as a data note on arXiv in late 2024, used a custom algorithm that interpolates across saturated pixels using the known point-spread function of the spectrograph. The method recovered 1,023 of the missing redshifts, with typical uncertainties of 0.001 in redshift — comparable to the survey's nominal precision. The remaining 977 objects are considered unrecoverable because the saturation extended across the entire width of the spectrum at those wavelengths.
Detection vs. Measurement: A Persistent Divide
The saturation limit error highlights a broader tension in astronomical instrumentation: pipelines are often optimized for detection of faint sources, not for accurate measurement of bright ones. Flux calibration routines assume that the detector response is linear, and they apply a single flat-field correction derived from twilight sky flats. For bright targets, the non-linearity near saturation can introduce systematic errors that are not captured by the standard uncertainty estimates.
Redshift estimation, especially for galaxies with strong emission lines, relies on line ratios — the relative strengths of, say, H-alpha and H-beta. If one line is partially saturated and another is not, the ratio is biased, leading to incorrect classification of the galaxy's physical state (e.g., star-forming versus active galactic nucleus). The pipeline's saturation flag, designed to prevent such biases, instead threw away the entire spectrum for many objects.
Galactic archaeology — the study of stellar populations and chemical abundances in nearby galaxies — requires accurate measurements of absorption lines, which are typically much fainter than emission lines. But even there, the brightest stars in the target galaxy can saturate the detector if the exposure time is too long. The same survey that lost 2,000 redshifts also clipped the continua of about 500 bright stars, though that error was caught earlier because stellar spectra are easier to model.
Similar issues have appeared in other instruments. The Near Infrared Spectrograph (NIRSpec) on the James Webb Space Telescope, for example, has a well-known saturation problem for bright standard stars used in flux calibration. The JWST team has issued multiple warnings about saturation in the NIRSpec detector, and users are advised to avoid targets brighter than a certain magnitude. But the threshold was set after launch, based on in-flight data, not during pre-launch testing.
The divide between detection and measurement is baked into the funding structure of many surveys. Proposals are judged on how many objects they will detect, not on how accurately they will measure the ones they do detect. A survey that returns 1.2 million redshifts with 0.1% systematic errors is considered a success; a survey that returns 1.198 million redshifts with 0.01% systematic errors might be seen as underperforming on the headline number.
Economic Incentives for Fast, Shallow Pipelines
The economic pressures that lead to such errors are well understood by anyone who has worked on a large survey. Telescope time is the most expensive resource: a single night on a 4-meter telescope can cost $20,000–$50,000, depending on the institution and the instrumentation. Funding agencies, such as the National Science Foundation or the Department of Energy, typically award grants based on the number of nights requested and the expected publication output. A delay of one month in starting the survey could mean losing 10–15 nights of observing time, worth hundreds of thousands of dollars.
Pipeline development, by contrast, is often underfunded and understaffed. The team that writes the data reduction software is usually a mix of postdocs and graduate students, who move on to other projects after a few years. Once the pipeline is "good enough" to produce first results, there is little incentive to go back and refine the parameters. A paper on pipeline optimization is unlikely to be cited as often as a paper on new science results, and it may not be considered for tenure or promotion.
In this case, the error was discovered by a graduate student who was not officially part of the HETDEX survey team. She was using the public data for her thesis on galaxy evolution, and she noticed that the spectra of some objects looked "chopped off" at the top. She spent three months writing code to check every object in the catalog, cross-referencing with other surveys, and eventually traced the problem to the saturation limit. Her advisor encouraged her to write a short paper about the finding, but she said she felt "a little embarrassed" about publishing a correction to someone else's data.
The HETDEX team, once informed, responded quickly. They updated the pipeline documentation, re-ran the affected exposures through the corrected algorithm, and issued a revised catalog. But the incident has not been widely publicized. The project's principal investigator declined to be interviewed for this article, saying only that "the issue has been resolved and we are moving forward."
Similar stories have played out in other fields. A previous investigation documented how an unversioned solver tolerance broke a computational fluid dynamics benchmark, and the pattern is the same: a small parameter, set early, never revisited, and then baked into a pipeline that runs for years.
Recovering the Lost Redshifts
The recovery effort, led by the graduate student and a postdoc at the University of Arizona, involved downloading raw two-dimensional frames from the HETDEX archive — about 3 terabytes of data. The frames were stored in the standard FITS format, with 16-bit integer pixel values. The saturation limit in the original pipeline was set at 55,000 analog-to-digital units (ADU), but the detector's linearity curve showed that the response remained linear up to about 68,000 ADU. The team developed an algorithm that identifies saturated pixels, fits a model of the point-spread function to the wings of the spectral trace, and interpolates the missing flux.
The algorithm was validated using a set of 200 spectra that were not saturated in the original frames but had been flagged by the pipeline for other reasons. The team also cross-checked their recovered redshifts against independent measurements from the Two Micron All Sky Survey (2MASS) and the Wide-field Infrared Survey Explorer (WISE). The agreement was within the expected uncertainties for about 90% of the recovered objects.
The remaining 10% showed systematic offsets of 0.002–0.005 in redshift, likely due to residual non-linearity in the detector that the simple interpolation could not correct. The team recommends that these objects be used only with caution, and they have flagged them in the revised catalog.
The recovery effort took about six months of work, including the time to download and process the data. The team's code is available on GitHub, and they have written a detailed data note explaining the algorithm. But they have not received any additional funding to extend the recovery to other surveys that might have similar issues. As the postdoc put it, "We did this on the side, because we care about the science. But it's not the kind of work that gets you a grant."
To understand the algorithm in more detail, consider a typical saturated pixel. When a pixel exceeds 55,000 ADU, the original pipeline would flag the entire spectrum as unreliable. The recovery algorithm instead examines the two-dimensional frame: it identifies the spectral trace (the row of pixels where the light falls) and fits a Gaussian model to the spatial profile perpendicular to the dispersion direction. For saturated pixels, the algorithm uses the wings of the profile — which are not saturated — to estimate the true peak flux. It then replaces the saturated pixel values with the model prediction. This works because the point-spread function of VIRUS is well-characterized and stable across the field of view. However, the algorithm assumes that the saturation is not too severe; if more than about 30% of the pixels in a given spectral row are saturated, the interpolation becomes unreliable. That is why 977 objects could not be recovered: their spectra were so bright that the entire core of the spatial profile was clipped, leaving no unsaturated wings to fit.
The story echoes a similar case in the galaxy rotation curve literature, where an uncorrected guide star catalog tie flattened the measured rotation curve of a nearby galaxy. In both cases, the error was small, systematic, and invisible to anyone not looking for it.
Lessons for Next-Generation Surveys
The next generation of galaxy surveys — the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST), the Nancy Grace Roman Space Telescope, and the Dark Energy Spectroscopic Instrument (DESI) — will collect orders of magnitude more data than HETDEX. They will also face the same economic pressures: fast pipelines, understaffed instrument teams, and a culture that rewards publication count over data quality.
Some steps are being taken to avoid similar errors. The DESI team, for example, has built a real-time saturation monitor into their pipeline that alerts operators when more than a certain fraction of pixels are saturated in a given exposure. The LSST data management team has published a detailed set of requirements for detector linearity and saturation handling, including stress tests with bright stars. But these measures are only as good as the assumptions behind them. A saturation limit that works for a typical exposure may fail for a rare bright object, and the pipeline may not flag it as anomalous.
One proposal, floated by several instrument scientists, is to include "bright object" tests in the commissioning phase of every new spectrograph. Instead of relying solely on calibration lamps, the team would observe a known bright star or quasar and verify that the pipeline produces correct results for a range of exposure times. This would cost one or two hours of telescope time — a small investment compared to the cost of losing 2,000 redshifts.
Another suggestion is to fund long-term data quality reviews as part of every survey's budget. Currently, postdocs and graduate students are expected to produce new science, not to check old data. A dedicated data quality team, with job security and a clear mandate, could catch errors like the saturation limit before they propagate into the public catalog. But such teams are rare, because funding agencies rarely see data quality as a research output.
However, even with best practices, such errors are hard to avoid entirely. The reason is that saturation is not a binary phenomenon: it depends on the exact brightness of the source, the atmospheric conditions, the telescope pointing, and the detector state. A pipeline that is tuned to avoid saturation for 99% of targets will still miss a few. And the incentives to find and fix those rare cases are weak, because they affect only a small fraction of the data. As one instrument scientist noted, "You could spend a year optimizing the saturation threshold and gain back 0.2% of your sample. Or you could spend that year observing new galaxies and get 10% more data. The math is not on the side of data quality."
The 2,000 missing redshifts are a small dent in a large survey, but they are also a reminder that the most expensive mistakes are often the ones that nobody notices. As the graduate student who found the error said, "I'm glad I caught it, but I wonder how many other surveys have similar issues that no one has looked for." The answer, likely, is more than a few.