Chat with us, powered by LiveChat Time series analysis paper — analyze completely | Writedemy

Time series analysis paper — analyze completely

Time series analysis paper — analyze completely

For Problem 2, you are to evaluate the given analysis and interpretation for clarity, completeness, sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you would do differently. Note: points will be deducted for comments on format. The critique must be about predictive analytics, not layout. Do not copy the report into your exam, rather use the

2 Return Surgeries, 15 points

For this problem, you are to evaluate the analysis and interpretation for clarity, completeness, sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you would do differently. Keep your assessments by the numbering system to assure your criticisms coincide with the respective material.

I do not want comments on format. Whether a figure is not in a convenient place is of not interest. Grammar and spelling are of no interest. I will mark down for non-essential criticism. Focus on the analysis and the interpretation of the results.

2.1 Introduction

A continuing question on daily return surgeries time series indices is whether any one series is interchangeable with another; i.e., does one surgery index time series have the same daily counts as some other particular surgery index time series within specified statistical error? Statistical differences in surgery index time series include, e.g., networks of multiple observers or counting methodology. The Debrecen index is compared to the Surgery Tracking And Recognition Algorithm (STARA) index, and with the Addendum of Authenticated and Verified Surgery Observations (AAVSO) index.

A pairwise comparison of index counts is confounded by the possible autocorrelation of each series, and hence a traditional regression-type comparison is inappropriate as the autocorrelation violates the regression independence assumption. In addition, two aspects of a time series must be examined for a comparison; when the count occurred and the count magnitude. The analytical methodologies include autocorrelation and cross-correlation from statistical time series analysis to determine when a count occurred, and the nonparametric Wilcoxon signed rank test to compare magnitudes of the count. The time series analyses are used to determine pairwise day-by-day alignment. Once the paired series are time-aligned, the count magnitudes are made using the Wilcoxon signed rank test as the counts data do not follow a normal (Gaussian) distribution.

Section 2.2 is a description of the three returning time series data sets; Section 2.3 discusses the time series statistical analyses of each of the three data sets; Section 2.4 is the data set com- parisons, or statistical time series cross-correlation analysis including a brief explanation of why regression is inappropriate; Section 2.5 are the count magnitude comparisons; and Section 2.6 are the conclusions.

2.2 Data Sets

This section describes the daily returning surgeries times series of the AAVSO, Debrecen, and STARA data sets. The descriptions indicate some of the characteristics that must be accounted for prior to a statistical comparison. The AAVSO series is described first, followed by the Debrecen series, and ending with the STARA series.

2.2.1 AAVSO Data

The AAVSO’s program of data-gathering and analysis of surgeries has been active since its inception in 1944. AAVSO raw data are submitted monthly as sets of date- and time-stamped values. The pre scrubbed AAVSO data contain 34,435 returning surgery counts that span from May 1, 2010 through July 12, 2013. The left panel of Figure 1 shows that these data are truncated on the left at zero counts, skewed to the right. The histogram suggests these count data follow a Poisson distribution.

2.2.2 Debrecen Data

The pre scrubbed Debrecen data contain 41,866 daily returning surgery counts that span from December 4, 1981 through January 5, 2011. As with the AAVSO data, the middle panel of Figure 1 shows that these data are truncated on the left at zero counts, skewed to the right. The histogram suggests these data follow a Poisson distribution.

Figure 1: AAVSO, Debrecen, and STARA index counts histograms of the pre scrubbed data. The green dashed curves are best-fit exponential distributions, and the black solid curves are best-fit gamma distributions..

2.2.3 STARA Data

The STARA data contain 1,152 daily returning surgery return counts span from May 1, 2010 through July 12, 2013. The right panel of Figure 1 shows that these data are truncated on the left at zero counts, and are skewed to the right. This suggests these counts data follow a Poisson distribution.

The Poisson distributions of each of these data sets affect the accuracy of the paired count magnitude comparisons, as will be seen below.

2.3 Autocorrelation Analysis

A time series is a stochastic process where the index set is of countable time increments; i.e., a time series is a set of observations, xt, each recorded at a specified time t. To allow for the possibly unpredictable nature of future observations we may suppose that each observation is a realization of a random variable Xt. The time series {xt, t ∈ T0} is a realization of the family of random variables {Xt, t ∈ T}, where T ≥ T0.; i.e., the realization xt is a subset of all possible values of Xt. The following time series process analyses assess whether the count pairings are index set (time) aligned. This alignment is necessary before paired counts magnitude comparisons can be made. The times series autocorrelation analysis is preceded by a descriptive analysis of the data sets.

2.3.1 Descriptive Analysis

The AAVSO and Debrecen series have days with multiple observations which we summarize by the count median. Further, the time span of each data set must be matched. The common span is found to be from May 1, 2010 through January 5, 2011. The time series that result from using the daily median and matched spans are displayed in Figures 2 and 3. Figure 2 depicts the three series in a stacked, matched-span plot. The AAVSO data are in the upper panel, the Debrecen data are in the middle panel, and the lower panel has the STARA data. These plots show ambiguously matched count magnitudes.

Figure 2: The AAVSO (top panel), Debrecen (middle panel), and STARA (bottom panel) matched-span time series plot. The data are daily..

Figure 2 has the three matched-span series superimposed over each other. The AAVSO series is the solid black curve, the Debrecen series is the dashed red curve, and the STARA series is the dotted green curve. As with the stacked plot, this plot also shows no obvious coincidence in count magnitude.

Figure 3: The three matched-span series superimposed. The AAVSO series is the solid black curve, the Debrecen series is the dashed red curve, and the STARA series is the dotted green curve. The data are daily..

Fortunately, statistical time series analysis is able to remove much of the apparent ambiguity. Time series analysis will help determine if the counts are time-aligned. Once this outcome is available, a magnitude comparison is possible.

2.3.2 Autocorrelation Models

Before the counts time series magnitudes can be compared, the individual time series must be examined for autocorrelation, as autocorrelation inflates the series variability. A critical property of any time series is stationarity, which is required to assess the autocorrelations and cross-correlations. Stationarity is the property of a time series in which, over a specified time span, the mean and variance of the series is constant. This is the time series analysis equivalent of the mean zero, constant variance assumption requirement for such statistical methods a the t-test, analysis of variance, and regression. If a time series follows a Gaussian distribution, then it can be shown that the time series is stationary.

We saw above that the three counts time series do not follow a normal distribution, and hence stationarity may not be assumed. A commonly used transformation to obtain a stationary time series is differencing. A first difference transformation is

5Xt = Xt −Xt−1 = Xt −BXt = (1−B)Xt, (1)

where 5Xt is the tth first difference operation between the tth and the t− 1st values of the random variable X, and B is the back shift operator such that BXt = Xt−1. The differencing operator may be extended to second (5(2)), third (5(3)), etc., differences, as can the back shift operator B,

but higher order differencing is not needed for the return surgeries time series. The first difference transformation results in stationarity for each of the three series.

With stationarity established, we can examine each series for autocorrelation. The sample autocorrelation function (ACF) and the sample partial autocorrelation function (PACF), and their associated plots, are used to identify if and what types of autocorrelation exist. The sample ACF measures time series white noise autocorrelation as a moving average order. The sample PACF measures time series autocorrelation as the order of autoregression

In Figures 4 and 5, the panels on the diagonal depict the first-differenced (lag 1) series sample ACF and sample PACF respectively. The off-diagonal panels are unadjusted cross-correlations between paired series, and are here ignored pending further time series analysis. In each figure, the row one column one panel is the AAVSO series, the second row second column panel is the Debrecen series, and the third row third column is the STARA series, each after taking first differences. We are interested in the plot lag values of each panel that extend above or below the horizontal blue dashed 95% confidence interval (CI) lines. Each series has 211 days of return surgery counts, and at the 95% CI, this suggests that there are 0.05× 211 ≈ 11 expected CI marginal overreaches. We therefore are interested in those lag patterns that strongly extend outside the CI band.

Figure 4 is the sample ACF of the three series. The zeroth lag (t = t) is ignored in each sample ACF plot. The AAVSO plot suggests a lag 1 (preceding day) moving average model should be examined. The Debrecen plot indicates that both a lag 1 and a lag 3 moving average model may be appropriate. The STARA plot, like the AAVSO plot, suggests a lag 1 moving average model should be tested.

Figure 4: The sample ACFs of the AAVSO, Debrecen, and STARA time series..

Figure 5 is the sample PACF of the three series. In each panel on the diagonal of the plot, there are no systematic overreaches of the CIs, i.e., the overreaches appear random, which suggests no autoregressive behavior in these three series.

Figure 5: The sample PACFs of the AAVSO, Debrecen, and STARA time series..

The sample ACF and sample PACF suggest the types of time series models for each surgery count source. The models take the form of Autoregressive Integrated Moving Average (ARIMA) models. The models are denoted as ARIMA(p, d, q), where AR refers to the autoregressive compo- nent, I refers to the integrated component which determines the order of differencing to establish stationarity, MA refers to the moving average component, and p, d, and q are the non-negative integers indicating the orders of autoregression, integration, and moving averaging, respectively. The ARIMA analysis of the AAVSO series gives a ARIMA(0, 1, 1) model, the Debrecen model is ARIMA(0, 1, 3), and the STARA model ARIMA(1, 1, 3).

Goodness-of-fit indicators for the ARIMA models are cumulative periodograms of the model standardized residuals, and time series plots of the standardized residuals. The behavior of the model residuals are particularly important for the cross-correlation analysis below. Figures ?? and ?? are the diagnostics for the AAVSO ARIMA(0, 1, 1) model. Figure ?? is the cumulative periodogram. The blue dashed diagonal lines define a 95% CI band that, if the black cumulative periodogram curve lies within, suggests the model is adequate. Containment of the curve within the CI band suggests it follows a normal distribution, which is an indicator of model adequacy. Figure ?? has three diagnostic plots. The top panel is the standardized residuals time series plot which indicates an adequate model when no more than 11 residuals exceed the plus or minus 3 standard deviation levels. The middle panel is the sample ACF of the residuals which suggest the ARIMA

model is adequate as all the lags lie within the horizontal red dashed 95% CI levels. The bottom panel is the Ljung-Box p-value plot in which no p-values fall below the threshold line indicating an adequate model. Hence, the ARIMA(0, 1, 1) may be considered a reasonable model of the AAVSO series.

Figure 6: AAVSO series ARIMA model diagnostic plots..

Figures ?? and ?? are the diagnostics for the Debrecen ARIMA(0, 1, 3) model. Figure ?? is the cumulative periodogram which suggests it follows a normal distribution. Figure ?? has the time-based diagnostic plots. The standardized residuals time series plot has only 2 of the possible 11 values that lie outside ±3 standard deviations. The sample ACF of the residuals suggest the ARIMA model has all the lags within the 95% CI band. The Ljung-Box p-value plot has no p- values below the horizontal red threshold line. Hence, the ARIMA(0, 1, 3) may be considered a reasonable model of the Debrecen series.

Figure ?? and ?? are the diagnostics for the STARA ARIMA(1, 1, 3) model. Figure ?? is the cumulative periodogram which suggests the periodogram is normally distributed. Figure ?? has the three time series diagnostic plots. The standardized residuals time series plot has no residuals outside the plus or minus 3 standard deviation levels. The sample ACF of the residuals has all the lags within the horizontal red dashed 95% CI levels. The Ljung-Box p-value plot has no p-values below the horizontal red threshold line. Hence, the ARIMA(1, 1, 3) may be considered a reasonable model of the STARA series.

The autocorrelation of each of the three return surgery data sets has been identified and de- scribed. The residuals analyses show that the residuals of each time series are reduced to white noise, and thus the residuals are independent between any series pair. This is an important property for the series comparisons. We may now make pairwise comparisons of the data sets.

Figure 7: Debrecen series ARIMA model diagnostic plots..

2.4 Cross-Correlation Analysis

The panel of scatter plots of the count sources in Figure 9 show the paired series associations. The second row, column one panel shows that the Debrecen versus AAVSO data have a clear nonlinear relationship with the smaller counts having the greater nonlinearity, and the large counts have the greater variability. A similar nonlinear relationship exists between the Debrecen and STARA series, which is depicted in the second row, column three panel. However, the STARA versus AAVSO data exhibit a more nearly linear relationship, though the variability of the larger counts is greater. This relationship is shown in the panel in the third row of the first column. Some of these characteristics have been addressed by constructing ARIMA models for each series, and it is with these models that the cross-correlations, i.e., the time-based alignment, may be developed.

With autocorrelated data it is difficult to assess the dependence or comparison between any two time series. It is therefore necessary to disentangle the linear association between any two series from their respective autocorrelations. Another property that must be satisfied is that the two series must be stationary and independent of each other. While the data may be stationary, they must still be transformed to white noise to assure independence. The transformation may be accomplished by using the residuals from the respective series ARIMA models. We saw from the ARIMA model diagnostics that the residuals from the series ARIMA models are white noise, thus implying that the residuals of the ARIMA models are independent. For example, it was shown that the AAVSO data are adequately modeled by an ARIMA(0, 1, 1) with no intercept term, so,

Figure 8: STARA series ARIMA model diagnostic plots..

for xt representing the AAVSO counts,

x̄t = zt − θzt−1 = (1− θB) zt, (2)

where x̄t is the white noise model return surgery count at time t, zt is the white noise value at time t, and θ is the white noise parameter that is estimated from the ARIMA model analysis. The ARIMA model residuals x̄t, t = 0,±1,±2, · · · , are white noise and this process is known as prewhitening.

We now compare the two series using the cross-correlation function (CCF) by prewhitening one series with its ARIMA model. The other series then is filtered through this same ARIMA model. Stationarity is assured by incorporating the first difference in the ARIMA filter. As prewhitening is a linear operation, any linear relationship between the two series will be preserved after prewhitening. For example, to compare the AAVSO data with the Debrecen data, first prewhiten the AAVSO data using its ARIMA model. Then filter the Debrecen data with the AAVSO ARIMA model. Finally, use the CCF to look for lags between the two series.

Often a regression model is used to measure the relationship of one counts series to another. The fallacy of this method arises from the violation of two assumptions of regression: (i) the response must follow a normal distribution, and (ii) the two series must be independent. The first assumption was shown above to be violated as the counts follow a Poisson distribution. The second assumption is violated as demonstrated by the autocorrelation identified in the ARIMA model analyses, which is an indictment of non-independence.

Figure 10 is the sample CCF between the ARIMA(0, 1, 1) filtered Debrecen counts and the

Figure 9: Scatter plots of the return surgeries count sources show the paired series associations..

ARIMA(0, 1, 1) prewhitened AAVSO counts. It is clear from the plot that the only lag is at zero, which suggests that the two series are nearly aligned in time.

Figure 11 is the sample CCF between the ARIMA(0, 1, 1) filtered STARA counts and the ARIMA(0, 1, 1) prewhitened AAVSO counts. The plot shows balance between the AAVSO the STARA data. The AAVSO series and the STARA series is balanced at lag 0. This balance suggests that the two series are aligned in time.

Figure 12 is the sample CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the ARIMA(1, 1, 3) prewhitened STARA counts. The AAVSO series and the STARA series are bal- anced at lag 0. This balance suggests that the two series are aligned in time.

The cross-correlation analysis gives the pairwise time alignments to compare the magnitude of the counts for each series. The cross-correlation between the AAVSO and Debrecen series have zero lag and hence they are aligned. The same result holds for the cross-correlation between the AAVSO and STARA data, i.e., they are aligned. Similarly, the cross-correlation between the STARA and Debrecen data show they are aligned.

2.5 Magnitude Comparison

With the appropriate shifts for each return surgery counts series if needed, the counts magnitude comparison is tested with the Wilcoxon signed ranks test. This test is used over the t-test as the counts data do not follow a normal distribution, which is an assumption required for the t-test. The n time-ordered data pairs (x1,1, x2,1), (x1,2, x2,2), · · · , (x1,n∗ , x2,n∗) for which the absolute value of

Figure 10: The sample CCF between the ARIMA(0, 1, 1) filtered Debrecen counts and the ARIMA(0, 1, 1) prewhitened AAVSO count residuals..

Figure 11: The sample CCF between the ARIMA(0, 1, 1) filtered STARA counts and the ARIMA(0, 1, 1) prewhitened AAVSO count residuals..

Figure 12: The sample CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the ARIMA(1, 1, 3) prewhitened STARA count residuals..

the differences are found such that

Di = x1,i − x2,i, i = 1, . . . , n∗. (3)

Simplistically, all differences with the value 0 are eliminated so the remaining differences are n ≤ n∗. The n |Di| differences are ordered from lowest to highest, and then are ranked 1 to n. The ith rank Ri is designated as a positive rank if Di > 0, or Ri is designated as a negative rank if Di < 0. The test statistic is the sum of the positive signed ranks:

T ∗ = ∑

Ri, ∀Ri 3 Di > 0, i = 1, . . . , n. (4)

The test statistic T ∗ is compared to the quantiles of a distribution whose shape varies depending on conditions.

Table 2 lists the surgery counts time series pairs and their respective Wilcoxon signed rank test statistics. The last column in the table indicates if the count magnitudes may be considered statistically equal. Only the STARA and Debrecen time series have statistically identical daily counts.

Table 2: Wilcoxon rank sum test with continuity correction counts magnitude comparison..

X Y n W P(>W) X = Y

AAVSO Debrecen 211 35368.5 < 2.2e− 16 no AAVSO STARA 211 34903 < 2.2e− 16 no STARA Debrecen 210 22286.5 0.8468 yes

2.6 Conclusions

Three time series of daily returning surgeries counts were compared for interchangeability; i.e., does one return surgery time series have the same daily counts as some other particular time series within specified statistical error? Each series had peculiarities, e.g., networks of multiple observers or counting methodology, for which some adjustments were made in the time series and magnitude analyses.

The Debrecen time series was compared to the STARA time series, and with the AAVSO time series. Also, the STARA and AAVSO series were compared. These daily time series were shown to be autocorrelated which was accounted for before the series were compared.

Each time series was made stationary by taking the first difference. The autocorrelation function and the partial autocorrelation function were used to identify the order and type of autocorrela- tion for each of the series. The analysis of the AAVSO series gave the ARIMA(0, 1, 1) model, the Debrecen series analysis gave the ARIMA(0, 1, 3) model, and the STARA analysis gave the ARIMA(1, 1, 3) model.

The cross-correlation function (CCF) between the ARIMA(0, 1, 1) filtered Debrecen counts and the ARIMA(0, 1, 1) prewhitened AAVSO counts showed the count changes occurred on the same days. It was clear from the plot that there was no lagging, which suggested that the two series were time-aligned. The CCF between the ARIMA(0, 1, 1) filtered STARA counts and the ARIMA(0, 1, 1) prewhitened AAVSO counts showed that the count series were time-aligned. The CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and the ARIMA(1, 1, 3) prewhitened STARA counts suggested that the Debrecen series and the STARA data are time aligned.

After the appropriate series shifts were made, the magnitude of the series counts was compared. Table 2 gives the details of the counts magnitude comparisons, and the table shows that only the STARA and Debrecen series are interchangeable.

We showed that returning surgeries time series counts comparisons are best made after a statis- tical times series analysis is performed. We also showed that, as the counts do not follow a normal distribution, the appropriate magnitude comparison statistical method is the Wilcoxon signed ranks test provided the series pairings first are time-aligned. The results showed that only the Debrecen series and the STARA series are interchangeable.

3 Bonus, 3D VAR(2) Model, 5 points

Set up the three-dimensional (3D) VAR(2) where the third variable does not Granger-cause the first variable. The Bonus.R script may help.

4 Bonus, “Best Model”, 5 points

Give criteria for aiding in the choice of a “best” time series model when two or more such models are available. What is, arguably, the most important criterion?

  • Time Series Model Construction, 20 points
    • Fossil Fuels Company Stocks
    • Blackhole Detection from Suspected Gravity Lensing
  • Return Surgeries, 15 points
    • Introduction
    • Data Sets
      • AAVSO Data
      • Debrecen Data
      • STARA Data
    • Autocorrelation Analysis
      • Descriptive Analysis
      • Autocorrelation Models
    • Cross-Correlation Analysis
    • Magnitude Comparison
    • Conclusions
  • Bonus, 3D VAR(2) Model, 5 points
  • Bonus, “Best Model”, 5 points

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Writedemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

Hire a tutor today CLICK HERE to make your first order