python - Correlation coefficients and p values for all pairs of rows of a matrix -



python - Correlation coefficients and p values for all pairs of rows of a matrix -

i have matrix data m rows , n columns. used compute correlation coefficients between pairs of rows using np.corrcoef:

import numpy np info = np.array([[0, 1, -1], [0, -1, 1]]) np.corrcoef(data)

now have @ p-values of these coefficients. np.corrcoef doesn't provide these; scipy.stats.pearsonr does. however, scipy.stats.pearsonr not take matrix on input.

is there quick way how compute both coefficient , p-value pairs of rows (arriving e.g. @ 2 m m matrices, 1 correlation coefficients, other corresponding p-values) without having manually go through pairs?

i have encountered same problem today.

after half hr of googling, can't find code in numpy/scipy library can help me this.

so wrote own version of corrcoef

import numpy np scipy.stats import pearsonr, betai def corrcoef(matrix): r = np.corrcoef(matrix) rf = r[np.triu_indices(r.shape[0], 1)] df = matrix.shape[1] - 2 ts = rf * rf * (df / (1 - rf * rf)) pf = betai(0.5 * df, 0.5, df / (df + ts)) p = np.zeros(shape=r.shape) p[np.triu_indices(p.shape[0], 1)] = pf p[np.tril_indices(p.shape[0], -1)] = pf p[np.diag_indices(p.shape[0])] = np.ones(p.shape[0]) homecoming r, p def corrcoef_loop(matrix): rows, cols = matrix.shape[0], matrix.shape[1] r = np.ones(shape=(rows, rows)) p = np.ones(shape=(rows, rows)) in range(rows): j in range(i+1, rows): r_, p_ = pearsonr(matrix[i], matrix[j]) r[i, j] = r[j, i] = r_ p[i, j] = p[j, i] = p_ homecoming r, p

the first version utilize result of np.corrcoef, , calculate p-value based on triangle-upper values of corrcoef matrix.

the sec loop version iterating on rows, pearsonr manually.

def test_corrcoef(): = np.array([ [1, 2, 3, 4], [1, 3, 1, 4], [8, 3, 8, 5]]) r1, p1 = corrcoef(a) r2, p2 = corrcoef_loop(a) assert np.allclose(r1, r2) assert np.allclose(p1, p2)

the test passed, same.

def test_timing(): import time = np.random.randn(100, 2500) def timing(func, *args, **kwargs): t0 = time.time() loops = 10 _ in range(loops): func(*args, **kwargs) print('{} takes {} seconds loops={}'.format( func.__name__, time.time() - t0, loops)) timing(corrcoef, a) timing(corrcoef_loop, a) if __name__ == '__main__': test_corrcoef() test_timing()

the performance on macbook against 100x2500 matrix

corrcoef takes 0.06608104705810547 seconds loops=10

corrcoef_loop takes 7.585600137710571 seconds loops=10

python numpy statistics scipy correlation

Comments

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

c# - Create a Notification Object (Email or Page) At Run Time -- Dependency Injection or Factory -

Set Up Of Common Name Of SSL Certificate To Protect Plesk Panel -