Archive | statistic analysis RSS feed for this section

Are We at a Tipping Point for Open Data?

18 Mar

Data sharing is on the rise, the French public health insurance shares openly its data as you can see in the link below:


Multiple sclerosis, what about the south-north gradient?

4 Jan

Recent researches found that the south north gradient of the occurrence of multiple sclerosis is diminishing in the USA and doesn’t exist anymore in France.
Here are the two articles:

Alvaro Alonso, MD Miguel A. Herna ́n, MD
Temporal trends in the incidence of multiple sclerosis A systematic review
Neurology 71 July 8, 2008
Available in full text here:

Fromont A, Binquet C, Sauleau EA, Fournel I, Bellisario A, Adnet J, et al. Geographic variations of multiple sclerosis in France. Brain 2010;133:1889-99.OpenUrlAbstract/FREE Full Text

Thus the epidemiology of multiple sclerosis seems to shift toward a less intense south north gradient but in the same time an aggravation of the sex ratio, aggravating the burden that women bear in this inflammatory disease that affects the central nervous system (i.e. the central unit of our body in terms of information technologies).

This tendency opens new interests for more epidemiological studies in this field. All the more because the observational studies aiming to prove the mixed, genetic and environmental, etiology are entangled with the current migrations of populations, the mobility of young people in view of finding a job and the changes in life habits in the populations of the south such as using more protection against the sun rays.

Professor Confavreux had well depicted the stakes in this editorial:

An unchanging man faced with changing times Christian Confavreux
DOI: 1663-1665 First published online: 24 May 2012

For the present moment multiple sclerosis etiological mechanism is an enigma and risks to remain so if researchers in the domain have no new data to crunch at the populational level.


18 Oct

Epidemiology and geography since long ago share common interests.
Epidemiologists have always searched the causes of contagious diseases by locating the very place where the outbreak began. Hence the necessity to develop sophisticated geographical statistical analysis methods in order to localize the point from where the disease originates and then spread across the country. But nowadays those methods are also implemented by searchers to highlight high concentrations of non epidemic, chronic, degenerative diseases in a given country. Here the causal agent is no more a bacteria nor a virus but indeed a spot of concentration of social inequality (or pollution, depending of the research question ). If a geographical concentration exist of lack of knowledge of what a healthy behavior is, or of low incomes restraining access to a healthy life, then the analysis should uncover a higher prevalence of the degenerative disease at less this is the hypothesis. Here below is a link toward a paper very accurate in demonstrating how different geographical statistical analysis methods can lead to a variation in the epidemiological results obtained. This point is crucial to consider because were it Ebola virus or social inequality or educational level context, causes of diseases will always have something to do with geography!

Big data challenges

7 Oct

Frontiers in Massive Data Analysis, from the National Research Council, nails some of the challenges of big data. But the challenges for massive data go beyond

via Big data challenges.

Propensity scores

25 Jul

Propensity score gives the probability of a subject in a population to belong to a group of interest such as a treatment group.
Then comparing subjects with the same propensity scores across treatment and no-treatment groups enables the researcher to infer on the effect of the treatment regarding a given outcome even if he works on merely observational data.
But the researcher must beware of the unobserved differences between the group of interest and the comparison group created using the propensity score.
As always the relevance of the model depends on the nature of the covariates entered in it.

Garrido, M. M., Kelley, A. S., Paris, J., Roza, K., Meier, D. E., Morrison, R. S. and Aldridge, M. D. (2014), Methods for Constructing and Assessing Propensity Scores. Health Services Research. doi: 10.1111/1475-6773.12182

Case base study vs case control study

25 Jul

Unlike the case control studies the case base studies are well suited to the cross sectional extractions from the reimbursement data bases that we usually do.
The case base studies use the whole population of the database as a control group , including the subjects who are affected by the disease (ie the cases).
Thus, making no difference whether the subjects have the disease or not , the control group is far more easy to constitute.

Citation: Chui TT-T, Lee W-C (2013) A Regression-Based Method for Estimating Risks and Relative Risks in Case-Base Studies. PLoS ONE 8(12): e83275. doi:10.1371/journal.pone.0083275

Pitfalls of retrospective database studies

30 Mar

As you know a part of my work consists to participate in studies based on the extraction from retrospective databases and the analysis of the informations thus retrieved. The eligibility of the beneficiaries to the provision that represents the study’s outcome is always a major concern. There is two explanations for a beneficiary not having access to a care according to the data retrieved from the reimbursement base: either a real lack of access or a non eligibility of the care for a record in the reimbursement data base (for example if the insured is covered by another insurance or has lost his coverage and has exited from the health plan)*. I have always to keep in mind that I work on secondary data which are only a reflection of the primary data the reality of which I try to apprehend.
The dilemma is pretty well addressed in this article:

*as always there is a third possibility: the data concerning the care has been erased from or not yet recorded in the base. The timeline of the refreshment of the base (ie the loading and the purifying of the data) must be precisely described in the methodology of the study.

Article cited:
1)- Motheral, B., Brooks, J., Clark, M. A., Crown, W. H., Davey, P., Hutchins, D., Martin, B. C. and Stang, P. (2003),

A Checklist for Retrospective Database Studies—Report of the ISPOR Task Force on Retrospective Databases.

Value in Health, 6: 90–97. doi: 10.1046/j.1524-4733.2003.00242.x

Two other articles address the pitfalls of inferring from secondary data extracted from a retrospective data base:

2)- Berger M, Mamdani M, Atkins D, Johnson M.

Good Research Practices for Comparative Effectiveness Research: Defining, Reporting and Interpreting Nonrandomized Studies of Treatment Effects Using Secondary Data Sources: The ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report—Part I.

Value in Health 2009 ; 12(8) :1044-52

The use of claims databases for outcomes research : Rationale, challenges, and strategies. Annual international meeting of the Association for Pharmacoeconomics and Outcome Research.

Philadelphia, Pennsylvania (USA), 1996/05/12. CLINICAL THERAPEUTICS, vol. 19, n° 2, 1997, pages 346-366, 74 réf., ISSN 0149-2918, USA. MOTHERAL (B.R.) *, FAIRMAN (K.A.). Outcomes Research. Express Scripts. Inc. Maryland Heights. USA

Full text of the article here:



4 Feb

Medicine and particularly health care is always a matter of time: time needed for recovery, time until the cure is completed, time elapsed until relapse, survival time . Time questions the searchers in health services. What if an event has occurred at a time when nobody was present to attest the exact moment of its appearance? What if we know when a peculiar health condition has ended but not exactly when it has begun?  If a health condition (disease or good health depending of what the searcher is studying) is interrupted during a short lapse of time and then resumes, how to handle the interruption interval? Last but not least when a searcher has not enough time to devote to wait for the final result should he eliminate the entire observation? How mathematicians apprehend this curious entity which we name time?

Their mathematical answer is: interval censored data.

Many thanks to SAS and it’s programmers!

Happy are the Buddhists with their here and now philosophy 😉


Enhanced by Zemanta

Flawed evidence based medicine

26 Jan

James Coynes is a professor of psychology who has dedicated an important part of his research field in denouncing the flawed evidence based medicine applied to the psychology practice. His writings in PLOs blog are an example of the critical appraisal we all should exercercise when we read such advertisements as “evidence supported”.
Here bellows is a link toward his posts in PLOs blogs:

Convenience sample

25 Jan

Convenience samples allow the researchers to easily have a first approach of what happens in a given population. But researcher has always to keep in mind that such a sampling method in addition of being easy is also always exposed to bias. Once a first approach have been made, it is mandatory that the results obtained by mean of a convenience sample be confirmed with a random sample or a clustered sample or a stratified sample. The two YouTube videos here bellows are indeed very helpful to apprehend the concept.

%d bloggers like this: