|Year : 2016 | Volume
| Issue : 2 | Page : 57-63
Forecasting visitor accession trend of two prominent Indian Health Journal websites for the period 2015-2020 using time series analysis
Nidhi Dwivedi, Sandeep Sachdeva
Department of Community Medicine, North DMC Medical College and Hindu Rao Hospital, New Delhi 110 007, India
|Date of Web Publication||30-Aug-2016|
Department of Community Medicine, North DMC Medical College and Hindu Rao Hospital, New Delhi 110 007
Source of Support: None, Conflict of Interest: None
Objective: To determine the pattern and forecast visitor accession trend of two national academic journal website: Indian Journal of Community Medicine (IJCM) and Indian Journal of Public Health (IJPH) for the period 2015-2020. Materials and Methods: The visitor accession details (number of times journal issue accessed online) for the period 2000-2014 (15 years) were collected and recorded on Microsoft Excel sheet. Time series analysis was then applied on the dataset using different forecasting models to predict the future trend of accession and value of a real dataset using R software (version 3.1). Results: Both the Indian journals are managed by independent professional bodies, but IJCM journal website was made online in 2007, 3 years ahead of IJPH (2010), leading to a very high accession (a proxy indicator for volume of readership) of IJCM during this period ranging between 100,000 and 120,000 counts, and thereafter accession was noticed to be slightly higher for IJPH than IJCM. The time series sequence showed that both had similar pattern, i.e., first stage: they have initial slow rise; second stage: sudden increasing trend from 2007 to 2010 (IJCM); and 2010 to 2012 (IJPH), respectively; and third stage: Both have then a decreasing trend with superimposed seasonal fluctuations. Future predicted accession details of IJCM and IJPH for 2015-2020 by Holt-Winter fitting model suggest stagnation with online accession of journal issue ranging from 30,360 to 31,860 counts for IJPH and 20,997 to 25,581 counts for IJCM though the range of accession for IJCM (4584) was higher than IJPH (1500), thereby reflecting that IJPH will attain stagnation earlier then IJCM. Autoregressive integrated moving average model also reflected similar results. Ljung-Box test indicated that the model was found statistically correct (P = 0.825 for IJCM and P = 0.50 (IJPH), and there was no statistically significant difference between actual values and predicted values by model. For IJCM dataset, value of R2 = 0.678 means that the model could explain 67.8% of the observed variation in the series, while it was able to explain 63.3% variations in IJPH series. Conclusion: To conclude within limitations, this study provides information on pattern and trend of visitor accession of public health journal website. The information unraveled from this study may further aids in planning, strengthening publication standards along with experimentation of innovative ideas to enhance visibility, global participation with a focus on retaining and enhancing journal user base.
Keywords: Autoregressive integrated moving average, behavior, cyclic, forecasting, online, prediction, public health, readers, regression, seasonal, smoothing technique, statistical modelling, trend, viewership
|How to cite this article:|
Dwivedi N, Sachdeva S. Forecasting visitor accession trend of two prominent Indian Health Journal websites for the period 2015-2020 using time series analysis. Digit Med 2016;2:57-63
|How to cite this URL:|
Dwivedi N, Sachdeva S. Forecasting visitor accession trend of two prominent Indian Health Journal websites for the period 2015-2020 using time series analysis. Digit Med [serial online] 2016 [cited 2022 Oct 2];2:57-63. Available from: http://www.digitmedicine.com/text.asp?2016/2/2/57/189512
| Introduction|| |
In our day to day life, there occur many situations where analysis of past data is needed to make current decisions and to predict and forecast future events. The objective can be fulfilled using statistical techniques called time series analysis, wherein observations are collected at regular time intervals. These observations have an order in which they appear and found to be correlated. , Typically, a time series comprises four components (variations) and traditionally three approaches (models) for forecasting future values. , These variations are: (1) trend variation (long-term change in the mean); (2) seasonal variation (patterns occur in a fixed and known period e.g., quarter of a year, month etc.); (3) cyclic changes (pattern exists when the data exhibit rise and fall that are not of a fixed or known period); (4) irregular component (any fluctuations that are observed excluding the above mentioned variations from a time series). The models include regression-based methods, exponential smoothing methods, and autoregressive integrated moving average (ARIMA) models.
In the last decade, growth in web potential in research, academic learning, and commercial usage has increased explosively. The innovative ideas are being applied using statistical knowledge to bring out newer thought processes and products for various stakeholders. With this background a study was undertaken to determine trend, pattern, and forecast visitor accession of two national public health academic journal websites (1) Indian Journal of Community Medicine (IJCM) and (2) Indian Journal of Public Health (IJPH) for the period 2015-2020.
| Materials and Methods|| |
Background information of two prominent National Public Health Journals is as follows.
Indian Journal of Community Medicine
The IJCM (ISSN = 0970-0218 [Print]; 1998-3581 [Electronic]) is the official organ of the Indian Association of Preventive and Social Medicine, a nonprofit professional body established in the year 1974. IJCM is an open access, peer-reviewed quarterly international publication in English language, indexed in large number of database and available in PubMed since 2008-09. IJCM is available online (www.ijcm.org.in) since September 15, 2007 including the availability and display of previous issues. The SCImago Journal Rank (SJR) of IJCM (2014) was 0.51 with 1.02 cites/document in last 2 years.
Indian Journal of Public Health
The IJPH (ISSN = 0019557X (Print); 2229-7693 (Electronic)] is the official organ of Indian Association of Public Health (IAPH), a nonprofit professional body established in the year 1956. IJPH is also open accessed, peer-reviewed, quarterly, international publication in English language, indexed in large number of database, and earliest records available in PUBMED dates back to year 1961. IJPH is available online (www.ijph.in) since 25 th Sep 2010 including the availability and display of previous issues. The SJR of IJPH (2014) was 0.37 with 0.81 cites/document in last 2 years. Most of the previous issues were made available online after the appearance of website. Both the journals have been outsourced for printing and multi-media journal website management to the same private vendor and both apply graded article processing charges from authors.
Journal hosting site routinely captures, stores, and displays anonymous accession details of manuscript/issue in a consolidated manner. The accession counts were culled out for 15 years period, i.e., January 2000-December 2014 to forecast the accession trend for the period 2015-2020. During this period, IJCM published volume number 25 (year 2000) to 39 (year 2014), whereas IJPH published volume number 44 (year 2000) to 58 (year 2014) with release of four-issues quarterly, i.e. January-March, April-June, July-September, and October-December per year leading to a total of 60 issues during this span by both the journals.
Publically available visitor accession data (a proxy indicator for volume of readership) was captured on July 1, 2015, recorded on Microsoft Excel sheet and displayed in [Table 1] and [Table 2]. There were no missing values but it is pertinent to mention that accession count indicates a cumulatively figure over time with one of the limitations being that it does not differentiate between new or repeat visitor as same reader can re-visit a number of times thereby increasing the accession status.
|Table 1: Accession counts of Indian Journal of Community Medicine journal website for the period 2000-2014|
Click here to view
|Table 2: Accession counts of Indian Journal of Public Health journal website for the period 2000- 2014|
Click here to view
Different forecasting models were then utilized to predict the future trend of accession and value of a real data set using R software (version 3.1).
The following steps were followed during modeling and forecast analysis: ,,
- Time series data were applied to determine the presence of basic features such as trends, seasonal behavior or both
- Eliminate any trend or seasonal components, either by differencing or by fitting an appropriate model to the data. In our data set, both the components were present and were eliminated by using software command (Holt-Winters [H-Ws] and auto ARIMA)
- Develop a forecasting model for the residuals. We used 20% of dataset for 'training' to find the parameters of the models, i.e., H-Ws and ARIMA 
- Validate the performance of the model from the previous step. The objective of this step is to select a particular model to be used in forecasting. We have used the remaining 80% data set for "testing"
- Check the difference between original time series and the forecasted values by model
- To compare models, forecast accuracy was measured using mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE)
In this study, H-W model and ARIMAs models were applied.
This method is also known as a triple exponential smoothing model and recommended when seasonality exists in the time series data. , It is based on three smoothing equations - one for the level, one for trend, and one for seasonality. In H-W method, there are three smoothing parameters α, β, and γ and range is 0-1. ,,,
Autoregressive integrated moving average model
Auto regressive integrated moving average or ARIMA model has three parameters (p, d, q) and is often written as ARIMA (p, d, q), ,, where P determines the number of steps into the past needed to forecast the current value. The parameter d represents the levels of differencing the original time series is needed for it to become stationary. Parameter q is the order of moving average process.
Forecast accuracy measures
Suppose our data set is denoted by y 1 ..., y T , and we split it into two sections: the training data (y 1 ..., y N ) and the test data (y N+1 ..., y T ). To check the accuracy of forecasting method, we estimate the parameters using the training data and forecast the next T-N observations.  These forecasts can then be compared to the test data.
The forecast errors are the difference between the actual values in the test-set and the forecasts produced.  Thus, e t = y t − y tǀN′ for t = N + 1.....T. y = denotes the predicted values.
In this paper, we have used three popular measures of accuracy i.e.,
| Results|| |
Both the Indian journals are managed by independent professional bodies but IJCM journal website was made online in 2007, 3 years ahead of IJPH (2010), leading to a very high accession (~readership) of IJCM during this period ranging between 100,000 and 120,000 counts. A developmental phenomenon occurred for the first time with the spread and penetration of internet in country plateau with online appearance of IJPH journal and thereafter visitor accession was noticed to be slightly higher for IJPH than IJCM [Figure 1]. Average visitor accession per issue for the period 2000-2014 of IJCM was 44,246.47 (issue 1), 45,401.0 (issue 2), 44,045.93 (issue 3), and 33,070.4 (issue 4), whereas it was 20,856.2; 15,061.0; 13,603.13; and 17,659.87 for IJPH, respectively. The time series analysis showed that both have similar pattern, i.e., first stage: they have initial slow rise; second stage: sudden increasing trend from 2007 to 2010 (IJCM) and 2010 to 2012 (IJPH), respectively; and third stage: both have then a decreasing trend with superimposed seasonal fluctuations.
|Figure 1: Line graph of accession details of Indian Journal of Community Medicine and Indian Journal of Public Health website, 2000-2014|
Click here to view
[Figure 2] and [Figure 3] show the auto-correlation function (ACF) and partial ACF plot showing dataset reaching a stationary level.
[Figure 4] and [Figure 5] show that the observed actual values (black line) and the predicted model values (red line) matched reasonably well for both IJCM and IJPH dataset and there is consistency in the trend.
[Figure 6] and [Figure 7] show the predicted accession details of IJCM and IJPH during 2015-2020 by H-W fitting model suggesting of stagnation and ranging from 30,360 to 31,860 counts for IJPH and 20,997 to 25,581 counts for IJCM though the range of accession for IJCM (4584) was higher than IJPH (1500) thereby reflecting that IJPH will attain stagnation earlier then IJCM.
|Figure 6: Future predicted value of Indian Journal of Community Medicine website accession|
Click here to view
|Figure 7: Future predicted value Indian Journal of Public Health website accession|
Click here to view
[Table 3] shows parameter (α) for H-Ws model which showed the influence of recent and later data on model. Alpha value ranges from 0 to 1 and the values nearer to 1 mean that the model is influenced more by later set of values rather than the recent values. IJCM (α =0.91) is influenced more by later data set, i.e., weightage given to 2014 values is less than the weightage given to 2013 value and so on. Whereas, IJPH (α =0.30) is highly influenced by recent data i.e., weightage given to 2014 value is more than weightage given to 2013 value and so on.
|Table 3: Holt - Winter model parameter values for Indian Journal of Community Medicine and Indian Journal of Public Health dataset|
Click here to view
[Table 4] shows the best fitted ARIMA model parameter for both sets. Different combinations for ARIMA model on the training data set was tried and it was found that ARIMA (0, 1, 0) (1, 1, 0) model was the best fitted with lowest variance (IJCM = 8352, IJPH = 3144) and Akike information criterion (IJCM = 1290.59, IJPH = 1238.35).
|Table 4: Autoregressive integrated moving average model parameter values for Indian Journal of Community Medicine and Indian Journal of Public Health dataset|
Click here to view
[Table 5] shows the comparison between H-Ws and best fitted ARIMA model using popular model error statistics MAE, RMSE, and MAPE. This table shows that for H-W model all the three error measures were lower than ARIMA model, suggesting that H-W model was better measure for forecasting than ARIMA model in current data series.
In [Table 6], Ljung-Box (modified Box-Pierce) test indicated that the model was found statistically correct (P = 0.825 for IJCM and P = 0.50 for IJPH, respectively). There was no statistically significant difference between actual values and predicted values by model. Although the time series model offers a number of different goodness of fit statistics, here stationary R2 value was used and larger values of stationary R2 (up to a maximum value of 1) indicate better fit. For IJCM dataset value of R2 = 0.678 meaning that the model could explain 67.8% of the observed variation in the series and similarly for IJPH dataset, it was able to explain 63.3% variation.
|Table 6: Model statistics for Indian Journal of Community Medicine and Indian Journal of Public Health accession dataset|
Click here to view
| Discussion|| |
People may visit the website for different reasons (information, transaction, or navigation) and the actual reason for the observed behavior may be difficult to infer. Time series models are particularly useful when little is known about underlying process one is trying to predict. The accession of academic journal website primarily depends upon objective of reader/scholar, professional level of researcher, credibility and popularity of author(s), innovative concept/research/idea or critical learning offered by a particular article/manuscript/document published by a journal.
Based on our analysis, it is concluded that the accession of these historically trusted popular Indian journal websites will stagnate during 2015-2020. The myriad reasons could be maturity of readers, high publication standard of journals leading to nonattraction of majority of mediocre academicians/researchers/scholars/readers, mushrooming of similar but multiple journal publishing houses, and general disinterest, etc. Another common reason personally observed, experienced, and noticed by authors is that there is substantial delay in communicating the decision with regard to acceptance or rejection of manuscript once the review process is initiated. This delay as calculated by author is a median of 6 months with the range of 3-15 calendar months (not shown in table) leading to frustration, despair, and anger toward journal administration. Time to publication is yet another parameter that needs urgent early attention by the esteem journal colleagues. These delay including noncommunication by the journal management members over the past few years have led to the emergence of bunch of other journal publishing housing in the country who are also committed to quality but ensuring timely and early decision making.
With increase access to internet, requirement of mandatory research publication for professional advancement, availability of research training environment, funding options, and positive research environment in the country there is large spurt of online submission of publication documents/articles which also probably burden the limited capacity of editorial board to communicate in reasonable time-frame in recent years. However, there could be other administrative and technical factors leading to delay. It is conceded with a pinch of salt that submitting authors may also be responsible for the delays. Therefore, effective and early communication between primary journal stakeholders (author, editor, and reviewers) may play a pivotal role in healthy development of journal in future.
Usefulness of this study may be applicable in planning and most important being toward increasing publication standards to a next higher level along with experimentation of innovative ideas to increase visibility, participation, enhancing, and retaining user base. Significant advise, use, and reporting by these journals on the lines of strengthening the reporting of observational studies in epidemiology and consolidated standards of reporting trials guidelines  may attract more global readers with higher citation, impact and credibility.
Probably, this is the first ever study carried out on Indian public health journal website with the application of time series analysis, and within limitations, a straight forward answer cannot be offered to a complex yet evolving phenomenon. Therefore, for healthy growth of journals and publishing authors (elite vs. mediocre) in our country, quality checks and balance will continue to dominate in near future. In the light of above discussion, journal management could consider increasing the annual frequency of publications from 4 (quarterly) to 6 (bi-monthly) or 12 (monthly) issues to retain and expand user base (authors/readers) without compromising the quality.
In a digital era with increasing internet penetration in country has the time arrived to phase out print version of journal? This decision may not be easy and would require introspection, discussion with stakeholders, and articulation of roadmap. Future research direction could include in-depth online readership behavior analysis and insight by systematically applying newer tools and technology (web-analytics, metrics) to these academic journal websites.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Pandey M. Basic and Advanced Biostatistics. New Delhi: MV Learning; 2015.
Montgomery DC, Jennings CL, Kulahci M. Introduction to Time Series Analysis and Forecasting. USA: John Wiley & Sons, Inc., Publications; 2008.
Shumway RH, Stoffer DS. Time Series Analysis and Its Applications. New York: Springer; 2011.
Kavinga HW, Jayasundara DD, Dushantha Jayakody NK. A new dengue outbreak statistical model using the time series analysis. Eur Int Sci Technol 2013;2:35-52.
Allard R. Use of time-series analysis in infectious disease surveillance. Bull World Health Organ 1998;76:327-33.
Shumway RH, Stoffer DS. Time Series Analysis and Its Applications. New York: Springer; 2006.
Jeelani A, Malik W, Haq I, Aleem S, Mujtaba M, Syed N. Cross-sectional studies published in Indian Journal of Community Medicine: Evaluation of adherence to strengthening the reporting of observational studies in epidemiology statement. Ann Med Health Sci Res 2014;4:875-8.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6]