Document Description

INPUT MODELING TECHNIQUES FOR DISCRETE-EVENT SIMULATIONS
Lawrence Leemis
Department of Mathematics
The College of William & Mary
Williamsburg, VA 23187–8795, U.S.A.
ABSTRACT
Most discrete-event simulation models have stochastic
elements that mimic the probabilistic nature of the sys-
tem under consideration. A close match between the in-
put model and the true underlying probabilistic mech-
anism associated with the system is required for suc-
cessful input modeling. Th

Document Share

Document Tags

Document Transcript

INPUT MODELING TECHNIQUES FOR DISCRETE-EVENT SIMULATIONS
Lawrence LeemisDepartment of MathematicsThe College of William & MaryWilliamsburg, VA 23187–8795, U.S.A.
ABSTRACT
Most discrete-event simulation models have stochasticelements that mimic the probabilistic nature of the sys-tem under consideration. A close match between the in-put model and the true underlying probabilistic mech-anism associated with the system is required for suc-cessful input modeling. The general question consid-ered here is how to model an element (e.g., arrival pro-cess, service times) in a discrete-event simulation givena data set collected on the element of interest. Forbrevity, it is assumed that data is available on the as-pect of the simulation of interest. It is also assumedthat raw data is available, as opposed to censored data,grouped data, or summary statistics. This example-driven tutorial examines introductory techniques for in-put modeling. Most simulation texts (e.g., Law andKelton 2000) have a broader treatment of input model-ing than presented here. Nelson and Yamnitsky (1998)survey advanced techniques.
1 DATA COLLECTION
There are two approaches that arise with respect to thecollection of data. The ﬁrst is the classical approach,where a designed experiment is conducted to collect thedata. The second is the exploratory approach, wherequestions are addressed by means of existing data thatthe modeler had no hand in collecting. The ﬁrst ap-proach is better in terms of control and the second ap-proach is generally better in terms of cost.Collecting data on the appropriate elements of thesystem of interest is one of the initial and pivotal stepsin successful input modeling. An inexperienced mod-eler, for example, collects wait times on a single-serverqueue when waiting time is the performance measureof interest. Although these wait times are valuable formodel validation, they do not contribute to the inputmodel. The appropriate data elements to collect foran input model for a single-server queue are typicallyarrival and service times. An analysis of sample datacollected on such a queue is given in Sections 3.1 and3.2.Even if the decision to sample the appropriate el-ement is made correctly, Bratley, Fox, and Schrage(1987) warn that there are several things that can be“wrong” about the data set. Vending machine sales willbe used to illustrate the diﬃculties.
ã
Wrong amount of aggregation. We desire to modeldaily sales, but have only monthly sales.
ã
Wrong distribution in time. We have sales for thismonth and want to model next month’s sales.
ã
Wrong distribution in space. We want to modelsales at a vending machine in location A, but onlyhave sales ﬁgures on a vending machine at locationB.
ã
Censored data. We want to model
demand
, but weonly have
sales
data. If the vending machine eversold out, this constitutes a right-censored observa-tion. The reliability and biostatistical literaturecontains techniques for accommodating censoreddata sets (Lawless 1982).
ã
Insuﬃcient distribution resolution. We want thedistribution of number the of soda cans sold at aparticular vending machine, but our data is givenin cases, eﬀectively rounding the data up to thenext multiple of 24.
2 INPUT MODELING TAXONOMY
Figure 1 contains a taxonomy illustrating the scope of potential input models available to simulation analysts.Modelers too often restrict their choice of input modelsto the top two branches. There is certainly no unique-ness in the branching structure chosen for the taxon-omy. The branches under
stochastic processes
, for ex-ample, could have been
state
followed by
time
, ratherthan
time
followed by
state
, as presented.
UnivariateMultivariateStochastic ProcessesDiscrete-timeContinuous-timeTime-independentmodelsDiscreteContinuousMixedContinuous-stateDiscrete-stateStationaryNonstationaryDiscrete-stateContinuous-stateStationaryStationaryStationaryNonstationaryNonstationaryNonstationaryInput ModelsBinomial(
n
,
p
)Normal(
µ
,
σ
2
)ContinuousMixedDiscreteEmpirical / Trace-drivenNormal(
µ
, Σ)Degenerate(
c
)Exponential(Λ)ARMA(
p
,
q
)ARIMA(
p
,
d
,
q
)Nonhomogeneous PoissonprocessBezier curveMarkov chainPoisson process(
λ
)Renewal processSemi-Markov chainMarkov processIndependent binomial(
n
,
p
)Bivariate exponential(
λ
1
,
λ
2
,
λ
12
)
Figure 1: A Taxonomy for Input Models
Examples of speciﬁc models that could be placed onthe branches of the taxonomy appear at the far rightof the diagram. Mixed, univariate, time-independentinput models have “empirical/trace-driven” given as apossible model. All of the branches include this particu-lar model. A
trace-driven
input model simply generatesa process that is identical to the collected data values soas not to rely on a parametric model. A simple exampleis a sequence of arrival times collected over a 24-hourtime period. The trace-driven input model for the ar-rival process is generated by having arrivals occur atthe same times as the observed values.The upper half of the taxonomy contains modelsthat are independent of time. These models could havebeen referred to as
Monte Carlo
models. Models areclassiﬁed by whether there is one or several variablesof interest, and whether the distribution of these ran-dom variables is discrete, continuous, or contains bothcontinuous and discrete elements. Examples of univari-ate discrete models include the binomial distributionand a degenerate distribution with all of its mass atone value. Examples of continuous distributions includethe normal distribution and an exponential distributionwith a random parameter Λ (see, for example, Martzand Waller 1982). B´ezier curves (Flanigan–Wagner andWilson 1993) oﬀer a unique combination of the para-metric and nonparametric approaches. An initial dis-tribution is ﬁtted to the data set, then the modelerdecides whether diﬀerences between the empirical andﬁtted models represent sampling variability or an as-pect of the distribution that should be included in theinput model.Examples of
k
-variable multivariate input models(Johnson 1987, Wilson 1997) include a sequence of
k
independent binomial random variables, a multivari-ate normal distribution with mean
µ
and variance-covariance matrix Σ and a bivariate exponential dis-tribution (Barlow and Proschan 1981).The lower half of the taxonomy contains stochas-tic process models. These models are often used tosolve problems at the system level, in addition toserving as input models for simulations with stochas-tic elements. Models are classiﬁed by how time ismeasured (discrete/continuous), the state space (dis-crete/continuous) and whether the model is station-ary in time. For Markov models, the discrete-state/continuous-state branch typically determines whetherthe model will be called a “chain” or a “process”,and the stationary/nonstationary branch typically de-termines whether the model will be preceded with theterm “homogeneous” or “nonhomogeneous”. Exam-ples of discrete-time stochastic processes include ho-mogeneous, discrete-time Markov chains (Ross 1997)and ARIMA time series models (Box and Jenkins1976). Since point processes are counting processes,they have been placed on the continuous-time, discrete-space branch.In conclusion, modelers are too often limited tounivariate, stationary models since software is typicallywritten for ﬁtting distributions to these models. Suc-cessful input modeling requires knowledge of the fullrange of possible probabilistic input models.
3 EXAMPLES
Two simple examples illustrate the types of decisionsthat often arise in input modeling. The ﬁrst exampledetermines an input model for service times and the sec-ond example determines an input model for an arrivalprocess.
3.1 Service Time Model
Consider a data set of
n
= 23 service times collected todetermine an input model in a discrete-event simulationof a queuing system. The service times in seconds are105.84 28.92 98.64 55.56 128.04 45.6067.80 105.12 48.48 51.84 173.40 51.9654.12 68.64 93.12 68.88 84.12 68.6441.52 127.92 42.12 17.88 33.00.[Although these service times come from the life testingliterature (Lawless 1982, p. 228), the same principlesapply to both input modeling and survival analysis.]The ﬁrst step is to assess whether the observationsare independent and identically distributed (iid). Thedata must be given in the order collected for indepen-dence to be assessed. Situations where the iid assump-tion would
not
be valid include:
ã
A new teller has been hired at a bank and the23 service times represent a task that has a steeplearning curve. The expected service time is likelyto decrease as the new teller learns how to performthe task more eﬃciently.
ã
The service times represent 23 times to completionof a physically demanding task during an 8-hourshift. If fatigue is a signiﬁcant factor, the expectedtime to complete the task is likely to increase withtime.If a simple linear regression of the observation num-bers versus the service times shows a signiﬁcant nonzeroslope, then the iid assumption is probably not appro-priate.Assume that there is a suspicion that a learningcurve is present, which makes a modeler suspect that
the service times are decreasing. One appropriate hy-pothesis test is
H
0
:
β
1
= 0versus
H
1
:
β
1
<
0associated with the linear model (Neter, Wasserman,and Kutner 1989)
Y
=
β
0
+
β
1
X
+
,
where
X
is the observation number,
Y
is the servicetime,
β
0
is the intercept,
β
1
is the slope, and
is anerror term. Figure 2 shows a plot of the (
x
i
,y
i
) pairsfor
i
= 1
,
2
,...,
23, along with the estimated regressionline. The
p
-value associated with the hypothesis testis 0.14, which is not enough evidence to conclude thatthere is a statistically signiﬁcant learning curve present.The negative slope is likely due to sampling variability.The
p
-value may, however, be small enough to warrantfurther data collection.
ããããããããããããããããããããããã5 10 15 20050100150ObservationNumberServiceTime
Figure 2: Service Time Vs. Observation NumberThere are a number of other graphical and statisti-cal methods for assessing independence. These includeanalysis of the sample autocorrelation function associ-ated with the observations and a scatterplot of adjacentobservations (Law and Kelton 2000). The sample au-tocorrelation function (ACF) for the service times isplotted in Figure 3 for the ﬁrst ten lags. The sampleACF value at lag 1, for example, is the sample cor-relation for adjacent service times. The sample ACFvalue at lag 4, for example, is the sample correlationfor service times four customers apart. The horizontaldotted lines at
±
2
√
n
are 95% bounds used to determinewhether the spikes in the ACF are statistically signiﬁ-cant. None were statistically signiﬁcant for the servicetime data. For this particular example, assume that weare satisﬁed that the observations are truly iid in orderto perform a classical statistical analysis.
Lag
A C F
0 1 2 3 4 5 6 7 8 9 10-0.4-0.20.00.20.40.60.81.0
Figure 3: Sample Autocorrelation FunctionThe next step in the analysis of this data set in-cludes plotting a histogram and calculating the valuesof some sample statistics. A histogram of the observa-tions is shown in Figure 4. Although the data set issmall, a skewed bell-shaped pattern is apparent. Thelargest observation lies in the far right-hand tail of thedistribution, so care must be taken to assure that itis representative of the population. The sample mean,standard deviation, coeﬃcient of variation, and skew-ness are¯
x
= 72
.
22
s
= 37
.
49
s
¯
x
= 0
.
521
n
n
i
=1
x
i
−
¯
xs
3
= 0
.
88
.
Examples of the interpretations of these sample statis-tics are:
ã
A coeﬃcient of variation
s/
¯
x
close to 1, along withthe appropriate histogram shape, indicates thatthe exponential distribution is a potential inputmodel.
ã
A sample skewness close to 0 indicates that a sym-metric distribution (e.g., a normal or uniform dis-tribution) is a potential input model.The next decision that needs to be made is whethera parametric or nonparametric input model should beused. One simple nonparametric model would repeat-edly select one of the service times with probability

Similar documents

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x