Add p-value adjustment based on local fdr #9335

jake-soloff · 2024-08-26T20:59:46Z

tests added / passed.
code/documentation is well formatted.
properly formatted commit message. See
NumPy's guide.

This PR adds a function lfdrcorrection for estimating the local false discovery rate (lfdr) based on a sequence of p-values. The method estimates the lfdr using the Grenander estimator of a monotone density. Rejecting the null whenever the estimated lfdr falls below some user-specified tolerance $\alpha$ yields the support line (SL) procedure. The SL procedure controls the lfdr--see this journal paper or corresponding arxiv preprint. This result is analogous to the fdr control guarantee for the Benjamini-Hochberg procedure, implemented in fdrcorrection.

Notes:

It is essential that you add a test when making code changes. Tests are not
needed for doc changes.
When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
When fixing a bug, you must add a test that would produce the bug in main and
then show that it is fixed with the new code.
New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
verify you changes are well formatted by running
```
git diff upstream/main -u -- "*.py" | flake8 --diff --isolated
```
assuming flake8 is installed. This command is also available on Windows
using the Windows System for Linux once flake8 is installed in the
local Linux environment. While passing this test is not required, it is good practice and it help
improve code quality in statsmodels.
Docstring additions must render correctly, including escapes and LaTeX.

pep8speaks · 2024-08-26T20:59:52Z

Hello @jake-soloff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file statsmodels/stats/multitest.py:

Line 383:77: W291 trailing whitespace

Comment last updated at 2024-08-28 20:41:25 UTC

statsmodels/stats/multitest.py

josef-pkt · 2024-08-27T14:58:09Z

Thanks, looks interesting. I need to read the article.

It looks like the unit tests are not run.
I guess the new method needs to be added to some @pytest.mark.parametrize list.
(update) isotonic regression was added in scipy version 1.12.0

We don't have sklearn as a dependency. I made an inline comment.

Otherwise, this looks good and should be easy to review and merge.

josef-pkt · 2024-08-27T15:40:20Z

statsmodels/stats/multitest.py

+    -----
+
+    Assumes p-values are uniformly distributed under the null and have a
+    decreasing density under the alternative.


AFAIU, it should also mention that p-values are i.i.d., mainly p-values are assumed to be independent.
(in contrast to other methods that allow for positively or negatively correlated p-values.)

I added independence as an assumption in the docstring since the main theory in the article assumes independence. Proposition 8 in the article allows for some dependence but it is a weaker result. Identically distributed is not necessary, however. Here is a result, mentioned in the discussion of the paper, to support this.

Suppose $p_1,\ldots,p_m$ are independent random variables on $[0,1]$ such that, if the $i$-th null $H_i$ is true, the corresponding $p$-value is uniformly distributed $p_i\sim \text{Uniform}(0,1)$. For the SL $(\alpha)$ procedure, the probability that the last rejection is null is exactly $\frac{m_0}{m}\alpha$, where $m_0$ denotes the number of nulls.

Interestingly, this result doesn't require decreasing alternative densities, but it is only justified to control the probability that the last rejection is null if the alternative densities are decreasing.

josef-pkt · 2024-08-27T15:41:29Z

statsmodels/stats/multitest.py

+        slopes = slopes[pvals_sortind.argsort()]
+    ell_values = np.minimum(1, pi_0_est/slopes)
+
+    return (ell_values <= alpha), ell_values


I think we should return a Holder or HolderTuple instance.
(preferred in new code)

I removed the rejection decision (ell_values <= alpha) but I added the tail FDR estimates, so I changed the return statement to return Holder(fdr=fdr, lfdr=lfdr).

josef-pkt · 2024-08-27T16:46:58Z

statsmodels/stats/multitest.py

+    pvals : array_like, 1d
+        List of p-values of the individual tests.
+    pi_0_est : float, optional
+        estimate of the null proportion. Defaults to conservative choice ``1.``.


local_fdr uses null_proportion as name of argument.
Is this the same type of parameter?
we should try to have names consistent across methods

josef-pkt · 2024-08-27T17:16:24Z

I'm not trying to understand the method (it's an Annals article and not easy to read).

The docstring should be informative enough to have a rough idea about what it does and how it (local fdr, max-lfdr) differs from standard FDR (BH).
(But I will not go over the details of the algorithm.)

To understand a bit of the background:
I never looked at the details of local FDR.

why is default alpha = 0.2? much larger than the usual 0.05.
Do the returned p-values (ell_values) correspond to the usual corrected p-values, or do they have a different meaning (local?) from BH-FDR?

(edit): end of section 1.2 sounds like alpha corresponds to controlling max-lfdr and not controlling fdr as in BH

to the code
alpha is only used for the rejection decision and is not part of the algorithm.
So I think the function should only return the corrected p-values (ell_values) and not the rejection decision.
For consistency, multipletests can compute the rejection decision.

Aside: The plan is to eventually remove the rejection decision from the return of multipletests. It was added mainly for historical reasons, when multiple testing articles were written in terms of rejection decisions and not in terms of corrected p-values.

extra question (not in this PR):
Is there a two- or multi-stage version of this? (similar to fdrcorrection_twostage "bky" )
For example
Theorem 4 and simulations in the article have pi_0^lambda

josef-pkt · 2024-08-27T18:02:55Z

IIUC now, then multitest would have three different statistics that are controlled

FWR
FDR
marginal FDR or max FDR

I guess local_fdr would also control marginal FDR but uses z-values instead of p-values as argument.
(add See Also)

In that case, we should make it more explicit in the multitests docstring, especially for the alpha parameter (as long as it is still included) and for the meaning of the returned corrected pvalues.

Fig 4 on page 594 looks informative (once the definitions are understood)
(IIUC, then I would prefer "marginal fdr" for intuition.)

update

more on difference between BH FDR (tail area average local fdr) and local fdr:

Efron, Bradley. 2007. “Size, Power and False Discovery Rates.” The Annals of Statistics 35 (4): 1351–77. https://doi.org/10.1214/009053606000001460.

josef-pkt · 2024-08-28T15:27:14Z

looking a bit more at the background, including fdrtool package in R
https://cran.r-project.org/web/packages/fdrtool/index.html

Is it possible to return the tail-FDR (corrected p-values) implied by the local fdr estimate?

The current multipletests only needs the lfdr/pvalues_corrected.
But it might be useful to expand the standalone function lfdrcorrection with other interesting returns.

Related:
Using normality assumption of zscores, I think we can add the current zscore based local_fdr as method to multipletests (using norm.cdf and norm.ppf to convert between zscores and p-valuus)
edit but we don't have the signs for the zscores, pvalues are positive.

The local_fdr function allows for a null_pdf option different from default gaussian, but in those cases the user can use the local_fdr function directly.

… lfdrcorrection

jake-soloff added 3 commits August 26, 2024 15:27

ENH: add p-value adjustments based on local false discovery rate

0dd0d9a

TST: test lfdrcorrection

535a004

DOC: add lfdrcorrection to multitest docs

d449819

josef-pkt reviewed Aug 27, 2024

View reviewed changes

statsmodels/stats/multitest.py Outdated Show resolved Hide resolved

josef-pkt added type-enh comp-stats prio-elev labels Aug 27, 2024

josef-pkt added this to the 0.15 milestone Aug 27, 2024

josef-pkt reviewed Aug 27, 2024

View reviewed changes

josef-pkt mentioned this pull request Aug 27, 2024

ENH: isotonic-regression, shape constraints, Grenander #9337

Open

jake-soloff added 6 commits August 28, 2024 11:25

switch to scipy implementation of iso, remove rejection decision from…

a8414d5

… lfdrcorrection

fix whitespace

cf80d24

change null proportion variable name

52d8caf

add tail fdr estimates

c108d9e

unsort tail fdr estimates

87ee304

update docstring

9d644f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add p-value adjustment based on local fdr #9335

Add p-value adjustment based on local fdr #9335

Uh oh!

jake-soloff commented Aug 26, 2024

Uh oh!

pep8speaks commented Aug 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

josef-pkt commented Aug 27, 2024 •

edited

Loading

Uh oh!

josef-pkt Aug 27, 2024

Uh oh!

jake-soloff Aug 28, 2024

Uh oh!

josef-pkt Aug 27, 2024

Uh oh!

jake-soloff Aug 28, 2024

Uh oh!

josef-pkt Aug 27, 2024

Uh oh!

josef-pkt commented Aug 27, 2024 •

edited

Loading

Uh oh!

josef-pkt commented Aug 27, 2024 •

edited

Loading

Uh oh!

josef-pkt commented Aug 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

Add p-value adjustment based on local fdr #9335

Are you sure you want to change the base?

Add p-value adjustment based on local fdr #9335

Uh oh!

Conversation

jake-soloff commented Aug 26, 2024

Uh oh!

pep8speaks commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2024-08-28 20:41:25 UTC

Uh oh!

Uh oh!

josef-pkt commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josef-pkt Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

jake-soloff Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

josef-pkt Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

jake-soloff Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

josef-pkt Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

josef-pkt commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josef-pkt commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josef-pkt commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pep8speaks commented Aug 26, 2024 •

edited

Loading

josef-pkt commented Aug 27, 2024 •

edited

Loading

josef-pkt commented Aug 27, 2024 •

edited

Loading

josef-pkt commented Aug 27, 2024 •

edited

Loading

josef-pkt commented Aug 28, 2024 •

edited

Loading