Modeling the Electroencephalogram (EEG) Signals Acquired during a Seizure of a Patient with Epilepsy Using Fourier Transformation

September 3, 2014 chelsea jin

It's a very initial version of my analyzing time series data from scripts, as well as using MatLab. It's only a school work, but I felt interested in modeling these EEG signal data acquired during a patient's seizure due to epilepsy, which looks like a chaos at the first sight.

And yes, by looking at the autocorrelation and partial autocorrelation estimates, the data are highly autocorrelated, but with many lags (around 20), even conditional on the 1st lag.

It would be appropriate to conduct an analysis by applying an ARIMA (p,d,q) function on such a time series with very large p and q, but I would try a different approach to see how it works. Here, I came up with an idea of using Fourier Transformation.

First, I split the entire 25,000 records into odd and even records. The odd records were used to form the model and the even ones were for cross-validation. Second, a fast Fourier transformation function was employed to model the odd records, transferring them from the time domain to a frequency domain, and the low-amplitude frequencies were removed by assuming those signals belong to some physiological activities, such as heart beats, breath, etc. Third, an inverse Fourier transformation was used to generate model-based records, and these generated records were compared with the odd records to obtain the residuals. As seen from the residual plot, the residuals were small. In addition, the autocorrelation of the residuals was also checked, and they seemed not autocorrelated very much.

The model specific records (green) were generated based on fast Fourier transformation.  These records were compared to the original odd records (blue) to obtain residuals (in the right). — The model specific records (green) were generated based on fast Fourier transformation. These records were compared to the original odd records (blue) to obtain residuals (in the right).

The left: Autocorrelation of model residuals on odd records.  The right: Plot of model-generated records against even records left for cross-validation. — The left: Autocorrelation of model residuals on odd records. The right: Plot of model-generated records against even records left for cross-validation.

A further step to validate the model was to compare the model-generated records with the even records left for cross-validation. Again, the residuals were plotted and checked autocorrelation.

The residuals obtained from cross-validation (the left) were compared with the residuals obtained from the odd records (the right).  Both were small and not autocorrelated. — The residuals obtained from cross-validation (the left) were compared with the residuals obtained from the odd records (the right). Both were small and not autocorrelated.

The method helped me try a different approach in terms of modeling time series data. Given more time and put in a more realistic framework, I would have thought more on how to brush up the method, understand the similarity and difference between time series functions and Fourier transformation in terms of modeling time series data, as well as interpret and utilize the model parameters from the Fourier transformation in order to apply the method into more generalized reality.

Ecstasy Use among US Adolescents from 1999 to 2008 →

September 2, 2014 chelsea jin

Gender differences in adolescent ecstasy (left) and marijuana (right) use, 1999-2008.  Percentages of adolescents of each gender, in each year, who were lifetime users. — Gender differences in adolescent ecstasy (left) and marijuana (right) use, 1999-2008. Percentages of adolescents of each gender, in each year, who were lifetime users.

The analyses were based on a set of national representative data, called The National Survey on Drug Use and Health (NSDUH). The overall sample exceeds half a million records in order to conduct this 10 years trend analysis . In addition, the prevalence calculation was based on each survey year and adjusted for the multilevel sampling design of this national study.

The significant differences in prevalences of ecstasy and marijuana use between genders were examined by a meta-analysis, adjusting for other covariates, such as age, ethnicity, household income and population density. The analysis used hierarchical Bayesian linear models, which allow to combine and summarize the results obtained from multiple datasets collected in different survey years.

Time Series Factor Analysis on U.S. Daily Treasury Yield Curve Rates

September 1, 2014 chelsea jin

Time series of U.S. Daily Treasury Yield Curve Rates, from Jul 2012 to Jul 2014.

Looking at the time series of daily treasury yield curve rates, we can see that the several series are trending in a similar way, although their exact numbers are different. So it comes a question whether there are only a few forces that drive the trend. A factor analysis on time series would be an answer of it.

First, I de-trended the 11 series, and it looks like this way. The 1-, 3-, 6-, 12-month rates look alike, so do 2-, 3-, 5-, 7-year as a group, and 10-, 20-, 30-year as another.

Differenced U.S. Daily Treasury Yield Curve Rates from Jul 2012 to Jul 2014.  The first column contains the differenced yield curve rates of 1-, 3-, 6-, and 12-month; the second column contains the differenced 2-, 3-, 5-, and 7-year rates;… — Differenced U.S. Daily Treasury Yield Curve Rates from Jul 2012 to Jul 2014. The first column contains the differenced yield curve rates of 1-, 3-, 6-, and 12-month; the second column contains the differenced 2-, 3-, 5-, and 7-year rates; and the third column has the differenced 10-, 20-, 30-year rates.

While the 3-factor exploratory model essentially explains the similarity and difference between the series. Although the 4-factor model has better overall model fits, such as RMSEA, AIC, etc., it has one factor with all series loading low. So, I prefer the 3-factor model, and then it goes to the plots of the factor scores of the de-trended series, as well as back to the original factor scores.

Differenced Factor Scores of the 3-Factor Model.

Details upon request.

Concordance between Gambling Disorder Diagnoses in the DSM-IV and DSM-5 →

August 29, 2014 chelsea jin

Sensitivity, Specificity and Hit Rates Using DSM5 (4 of 9 Criteria) Classification Systems Relative to DSM4 (5 of 10 Criteria).

Limited by the option that a Gallery page provides with, I found that larger images can be posted and sent to a lightbox by being clicked on in blogs. So I would start my very first blog now!

Here are some data visualization samples supplementary to a published paper (the full-text paper can be obtained by clicking the title of this blog).

The original purpose of this publication was to see the difference in terms of changing diagnostic criteria of gambling disorder from DSM4 (5 of 10) to DSM5 (4 of 9). Here it goes the above plot to visualize the data of sensitivity, specificity and overall hit rates due to the change. In the paper, a significant increase in prevalence has also been found after lowering the threshold. In addition, the increase only seems to come from reducing the number of criteria being met, instead of eliminating the "Illegal Acts" criterion as a consideration - a detailed discussion can also be read in the published article. Therefore, there comes a question that whether the "Illegal Acts" criterion really matters to classifying a person to be a pathological gambler. Furthermore, since the initial draft also displayed the changes in prevalence in different gender, age, racial subgroups, there comes another question that whether the effect of changing diagnostic criteria occurs in between different subgroups.

To address these questions, two further analyses were conducted. One was Latent Class Analysis (LCA) to classify a sample into 2 groups, with and without gambling disorder, based on the overall 10 criteria. The results are summarized into a plot and shown in this blog. In the plot, every dot point represents the probability of a person's meeting a certain criterion given the person is being classified into a certain group (with or without gambling disorder). Obviously seen, no matter whether or not a person's been diagnosed as a patient with gambling disorder, the probabilities that he meets the "Illegal Acts" criterion are both low and very close. In another word, he has almost equal low probabilities to commit illegal acts, no matter whether he is a pathological gambler or not, which is saying illegal acts happen with equal likelihoods and NOT quite often in both pathological gambles and non-gamblers. This classification analysis (LCA) supports that eliminating the "Illegal Acts" criterion may not impact much in diagnoses.

Latent class analysis to classify people into two groups, with and without gambling disorder, based on 10 criteria.

The other analysis was logistic regression with an interactive term of different diagnostic criteria with subgroups (e.g. DSM*gender, where DSM=1 represents DSM4, and 0 is DSM5; gender=1 is female and 0 is male). The results were also shown in a plot. Although a change of criteria from DSM4 to DSM5 resulted in an increase in prevalence, the increase may be almost equal in all the sub-populations, seen from the interactive terms NOT being significant.

Odds Ratios obtained from comparing two diagnostic systems for gambling disorder, DSM4 (5 of 10) and DSM5 (4 of 9).

Finally, the two major questions raised from the original draft has been addressed. The reviewers of the paper have been satisfied, and it can be read in the journal of Psychology of Addictive Behaviors, a version in 2014. :)