This document presents a test for detecting a single upper outlier in a sample from a Johnson SB distribution when the parameters of the distribution are unknown. The test statistic proposed is based on maximum likelihood estimates of the four parameters (location, scale, and two shape) of the Johnson SB distribution. Critical values of the test statistic are obtained through simulation for different sample sizes. The performance of the test is investigated through simulation, showing it performs well at detecting outliers when the contaminant observation represents a large shift from the original distribution parameters. An example application to census data is also provided.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
BioSHaRE: Analysis of mixed effects models using federated data analysis appr...Lisette Giepmans
ย
BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 3: Study application and results
Contact info:
Prof. Edwin van den Heuvel
University of Eindhoven
e.r.v.d.heuvel@tue.nl
key words: biobank, bioshare, cohort, data sharing, epidemiology, harmonisation, statistics
Lesson 27 using statistical techniques in analyzing datamjlobetos
ย
The document discusses statistical techniques for analyzing data, including scatter diagrams, correlation coefficients, regression analysis, and chi-square tests. It provides examples of using scatter diagrams to visualize the relationship between two variables, calculating the Pearson correlation coefficient to determine the strength of linear relationships, and using simple linear regression to find the regression equation that best predicts a dependent variable from an independent variable. It also explains how to perform a chi-square test to analyze relationships between categorical variables by comparing observed and expected frequencies.
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanAzmi Mohd Tamil
ย
This document discusses non-parametric statistical tests including the Wilcoxon rank sum test, Kruskal-Wallis test, and Spearman/Kendall correlation. It provides an overview of when to use these tests, their assumptions, procedures, advantages and disadvantages. Examples are given to illustrate how to perform the Wilcoxon rank sum test, Kruskal-Wallis test, and Wilcoxon signed rank test step-by-step. SPSS output is also shown for these tests.
The document describes the Wilcoxon Rank-Sum Test, a non-parametric statistical hypothesis test used to assess whether one of two independent samples of observations tends to have larger values than the other when normality cannot be assumed. It provides details on running the test, including ranking the combined observations and computing the test statistic to determine if it is less than or equal to the critical value, rejecting the null hypothesis. An example applies the test to compare the nicotine content of two cigarette brands, finding no significant difference between their medians.
The chapter discusses analysis of variance (ANOVA), including one-way and two-way ANOVA tests. It outlines the goals of understanding when to use ANOVA, different ANOVA designs, how to perform single-factor hypothesis tests and interpret results, conduct post-hoc multiple comparisons procedures, and analyze two-factor ANOVA tests. The key aspects covered include partitioning total variation into between-group and within-group variation, calculating sum of squares, mean squares, and F statistics to test for differences between group means. Post-hoc procedures like Tukey-Kramer are also introduced to determine which specific group means are significantly different from each other.
This document provides an overview of key concepts in statistics as they relate to environmental sampling and analysis. It defines common statistical terms like mean, median, mode, variance, standard deviation, and normal distribution. It discusses population vs. sample, random variables, and the use of histograms and box plots to visualize data. Key aspects of accuracy, precision, and experimental error are covered. The document also introduces concepts like linear regression, correlation, and their uses in environmental analysis. Estimating mean and variance from a sample is discussed along with the use of ฮฑ values in determining confidence intervals for probability distributions.
This document discusses the assumptions of ANOVA and methods for addressing violations of those assumptions, including data transformations and non-parametric tests. It notes that the normality assumption in ANOVA pertains to the residuals rather than the response variable. Various transformations are presented that can help achieve normality when the residuals are not normally distributed, including logarithmic, square root, arcsine-square root, and reciprocal transformations. Non-parametric alternatives to ANOVA like the Wilcoxon rank sum test and Kruskal-Wallis test are also introduced. An example using infection rate data from birds in different landscapes is provided to demonstrate checking assumptions and applying transformations if needed.
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
BioSHaRE: Analysis of mixed effects models using federated data analysis appr...Lisette Giepmans
ย
BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 3: Study application and results
Contact info:
Prof. Edwin van den Heuvel
University of Eindhoven
e.r.v.d.heuvel@tue.nl
key words: biobank, bioshare, cohort, data sharing, epidemiology, harmonisation, statistics
Lesson 27 using statistical techniques in analyzing datamjlobetos
ย
The document discusses statistical techniques for analyzing data, including scatter diagrams, correlation coefficients, regression analysis, and chi-square tests. It provides examples of using scatter diagrams to visualize the relationship between two variables, calculating the Pearson correlation coefficient to determine the strength of linear relationships, and using simple linear regression to find the regression equation that best predicts a dependent variable from an independent variable. It also explains how to perform a chi-square test to analyze relationships between categorical variables by comparing observed and expected frequencies.
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanAzmi Mohd Tamil
ย
This document discusses non-parametric statistical tests including the Wilcoxon rank sum test, Kruskal-Wallis test, and Spearman/Kendall correlation. It provides an overview of when to use these tests, their assumptions, procedures, advantages and disadvantages. Examples are given to illustrate how to perform the Wilcoxon rank sum test, Kruskal-Wallis test, and Wilcoxon signed rank test step-by-step. SPSS output is also shown for these tests.
The document describes the Wilcoxon Rank-Sum Test, a non-parametric statistical hypothesis test used to assess whether one of two independent samples of observations tends to have larger values than the other when normality cannot be assumed. It provides details on running the test, including ranking the combined observations and computing the test statistic to determine if it is less than or equal to the critical value, rejecting the null hypothesis. An example applies the test to compare the nicotine content of two cigarette brands, finding no significant difference between their medians.
The chapter discusses analysis of variance (ANOVA), including one-way and two-way ANOVA tests. It outlines the goals of understanding when to use ANOVA, different ANOVA designs, how to perform single-factor hypothesis tests and interpret results, conduct post-hoc multiple comparisons procedures, and analyze two-factor ANOVA tests. The key aspects covered include partitioning total variation into between-group and within-group variation, calculating sum of squares, mean squares, and F statistics to test for differences between group means. Post-hoc procedures like Tukey-Kramer are also introduced to determine which specific group means are significantly different from each other.
This document provides an overview of key concepts in statistics as they relate to environmental sampling and analysis. It defines common statistical terms like mean, median, mode, variance, standard deviation, and normal distribution. It discusses population vs. sample, random variables, and the use of histograms and box plots to visualize data. Key aspects of accuracy, precision, and experimental error are covered. The document also introduces concepts like linear regression, correlation, and their uses in environmental analysis. Estimating mean and variance from a sample is discussed along with the use of ฮฑ values in determining confidence intervals for probability distributions.
This document discusses the assumptions of ANOVA and methods for addressing violations of those assumptions, including data transformations and non-parametric tests. It notes that the normality assumption in ANOVA pertains to the residuals rather than the response variable. Various transformations are presented that can help achieve normality when the residuals are not normally distributed, including logarithmic, square root, arcsine-square root, and reciprocal transformations. Non-parametric alternatives to ANOVA like the Wilcoxon rank sum test and Kruskal-Wallis test are also introduced. An example using infection rate data from birds in different landscapes is provided to demonstrate checking assumptions and applying transformations if needed.
Two-way ANOVA has many of the same ideas as one-way ANOVA, with the main difference being the inclusion of another factor (or explanatory variable) in our model.
In the two-way ANOVA model, there are two factors, each with its own number of levels. When we are interested in the effects of two factors, it is much more advantageous to perform a two-way analysis of variance, as opposed to two separate one-way ANOVAs.
This document provides an introduction and overview of analysis of variance (ANOVA). It discusses one-way and two-way ANOVA. For one-way ANOVA, it defines the technique, provides notation for hypotheses testing, and works through an example comparing sales data from three marketing strategy groups. It notes the limitations of one-way ANOVA for this example and introduces two-way ANOVA as a way to analyze the effects of two factors - marketing strategy and advertising media. Two-way ANOVA allows testing of differences in means for each factor and any interactions between factors.
This document provides an overview of analysis of variance (ANOVA). It introduces ANOVA and its key concepts, including its development by Ronald Fisher. It defines ANOVA and distinguishes between one-way and two-way ANOVA. It outlines the assumptions, techniques, and examples of how to perform one-way and two-way ANOVA. It also discusses the uses, advantages, and limitations of ANOVA for analyzing differences between multiple means and factors.
This document provides an overview of experimental design and analysis of variance (ANOVA). It defines key terms like independent and dependent variables, experimental units, treatments, and blocks. It explains different types of experimental designs like completely randomized designs, randomized block designs, and factorial experiments. It also covers ANOVA computations and assumptions for one-way and randomized block ANOVA models. Multiple comparison procedures like Tukey's HSD are introduced to identify differences between specific treatment means. Examples are provided to demonstrate applications of one-way and randomized block ANOVA.
This document provides an overview of analysis of variance (ANOVA) techniques. It discusses one-way ANOVA, which evaluates differences between three or more population means. Key aspects covered include partitioning total variation into between- and within-group components, assumptions of normality and equal variances, and using the F-test to test for differences. Randomized block ANOVA and two-factor ANOVA are also introduced as extensions to control for additional variables. Post-hoc tests like Tukey and Fisher's LSD are described for determining specific mean differences.
This document discusses two-way analysis of variance (ANOVA), which analyzes the relationship between two categorical independent variables and a continuous dependent variable. It provides an example using IQ scores categorized by sex and blood lead level. Two-way ANOVA tests for an interaction effect between the factors and also tests whether each factor individually has an effect. In this example, there is no significant interaction effect or individual effects of sex or blood lead level on IQ scores.
This lab report summarizes an analysis of normality for various variables in a dataset. Histograms and normal quantile plots were used to visually assess whether age, weight, height, blood pressure, salary, and charity amounts followed a normal distribution. Variables that were non-normal were transformed using mathematical functions to achieve a normal distribution. The transformed variables and BMI, which was normally distributed, were then analyzed. A random normal distribution was simulated and compared to the true BMI distribution, finding the probabilities were close.
The chi-square test is used to determine if an observed frequency distribution differs from an expected theoretical distribution. It can test goodness of fit, independence of attributes, and homogeneity. The test involves calculating chi-square by taking the sum of the squares of the differences between observed and expected frequencies divided by expected frequencies. For the test to be valid, certain conditions must be met regarding sample size, expected frequencies, independence, and randomness. The test has some limitations such as not measuring strength of association and being unreliable with small expected frequencies.
The Student's t-test is used to compare the means of two samples and determine if they are statistically different. In an example, biomass measurements were taken from four replicates each of bacterium A and bacterium B. The mean, variance, and t-value were calculated and found to exceed the critical t-value at a significance level of 5%, indicating the means were significantly different. Therefore, the null hypothesis that the samples do not differ can be rejected.
Development of a Spatial Path-Analysis Method for Spatial Data AnalysisIJECEIAES
ย
Path analysis is a method for identifying and analyzing direct and indirect relationship be- tween independent and dependent variables. This method was developed by Sewal Wright and initially only used correlation analysis results in identifying the variablesโ relationship. So far, path analysis has been mostly used to deal with variables of non-spatial data type. When analyzing variables that have elements of spatial dependency, path analysis could result in a less precise model. Therefore, it is necessary to build a path analysis model that is able to identify and take into account the effects of spatial dependencies. Spatial autocorrelation and spatial regression methods can be used to enhance path analysis to identify the effects of spatial dependencies. This paper proposes a method derived from path analysis that can process data with spatial elements and furthermore can be used to identify and analyze the spatial effects on the data; we call this method spatial path analysis.
This document provides an overview of analysis of variance (ANOVA). It discusses two-way ANOVA and the design of experiments (DOE) including completely randomized design (CRD) and randomized block design (RBD). CRD is the simplest design where treatments are randomly allocated without blocking. RBD uses blocking to reduce experimental error by making comparisons only between treatments within the same block. The document provides formulas and examples for calculating ANOVA tables for one-way and two-way ANOVA to test for differences between sample means.
This chapter discusses analysis of variance (ANOVA) techniques. It covers one-way and two-way ANOVA designs, and how to perform and interpret the results of one-way and two-way ANOVA tests. It also discusses how to partition total variation into within-group and between-group components, calculate mean squares, obtain F-statistics, and make inferences about differences in group means. Multiple comparison procedures for identifying which specific group means differ are also introduced.
This document discusses strategies for designing factorial experiments with multiple factors. It explains that factorial experiments involve studying the effect of varying levels of factors on a response variable. The optimal design strategy depends on whether the circumstances are unusual or normal. For normal circumstances where there is some noise and factors influence each other, a fractional factorial or full factorial design is typically best. The document provides details on analyzing the data from factorial experiments to determine if factor effects and interactions are significant. It includes examples of calculating main effects and interactions from 2-level factorial data.
This document provides information on performing a one-way analysis of variance (ANOVA). It discusses the F-distribution, key terms used in ANOVA like factors and treatments, and how to calculate and interpret an ANOVA test statistic. An example demonstrates how to conduct a one-way ANOVA to determine if three golf clubs produce different average driving distances.
The document discusses analysis of variance (ANOVA), a statistical technique developed by R.A. Fisher in 1920 to analyze the differences between group means and their associated procedures. It can be used when there are two or more samples to study the significance of differences between their mean values. ANOVA works by decomposing the overall variability into different sources and comparing the relative sizes of different variances. It is useful for research in fields like agriculture, biology, pharmacy, and more.
This document discusses metrics for assessing the predictability and efficiency of covariate-adaptive randomization designs in clinical trials. It proposes measuring predictability using a modified Blackwell-Hodges potential selection bias metric that calculates how well an observer could guess the next treatment assignment. It also considers entropy and periodicity measures. Balance/efficiency is proposed to be measured using Atkinson's method of quantifying the loss of statistical power as an equivalent reduction in sample size due to treatment imbalances within subgroups. The document then outlines a simulation study to compare various randomization methods using these proposed metrics.
Efficiency of ratio and regression estimators using double samplingAlexander Decker
ย
This document discusses different sampling methods for estimating population parameters, including ratio estimation, regression estimation, and simple random sampling without replacement. It compares the efficiency of using double sampling for ratio estimation versus double sampling for regression estimation versus simple random sampling without replacement. The key findings are that double sampling for regression estimation is more efficient than double sampling for ratio estimation or simple random sampling without replacement if the regression line does not pass through the origin. Relative efficiency and coefficient of variation are used to empirically compare the different estimators.
Statistics is the science of dealing with numbers and data. It involves collecting, summarizing, presenting, and analyzing data. There are four main steps: data collection, summarization by removing unwanted data and classifying/tabulating, presentation with diagrams/graphs/tables, and analysis using measures like average, dispersion, and correlation. Descriptive statistics summarize and describe data, while inferential statistics allow generalizing from samples to populations. Common descriptive statistics include measures of central tendency (mean, median, mode), variability (range, variance, standard deviation), and distribution properties. Inferential statistics techniques like hypothesis testing and ANOVA are used to make inferences about populations based on samples.
Austin Statistics is an open access, peer reviewed, scholarly journal dedicated to publish articles in all areas of statistics.
The aim of the journal is to provide a forum for scientists, academicians and researchers to find most recent advances in the field statistics.
Austin Statistics accepts original research articles, review articles, case reports and rapid communication on all the aspects of statistics.
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
ย
1. The document proposes a class of estimators for estimating the population mean in two-phase sampling using auxiliary information.
2. Some common estimators like the ratio, product, and regression estimators are special cases within the proposed class. Expressions for bias and mean squared error of the estimators are obtained up to the first order of approximation.
3. Asymptotically optimum estimators are identified that have minimum mean squared error. The proposed class of estimators is found to perform better than usual ratio and other estimators for population mean estimation.
A Study of Some Tests of Uniformity and Their PerformancesIOSRJM
ย
This document discusses a study that evaluates the performance of eleven tests for uniformity. The tests include Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, Watson, Sukhatme, probability product, Kuiper, Gini, ZhangA, ZhangC tests. Through simulation, the power of these tests is assessed under various sample sizes and five alternative distributions. The results are displayed in tables and graphs. The document concludes that the performance of the tests depends on the sample size and nature of the alternative distribution.
Two-way ANOVA has many of the same ideas as one-way ANOVA, with the main difference being the inclusion of another factor (or explanatory variable) in our model.
In the two-way ANOVA model, there are two factors, each with its own number of levels. When we are interested in the effects of two factors, it is much more advantageous to perform a two-way analysis of variance, as opposed to two separate one-way ANOVAs.
This document provides an introduction and overview of analysis of variance (ANOVA). It discusses one-way and two-way ANOVA. For one-way ANOVA, it defines the technique, provides notation for hypotheses testing, and works through an example comparing sales data from three marketing strategy groups. It notes the limitations of one-way ANOVA for this example and introduces two-way ANOVA as a way to analyze the effects of two factors - marketing strategy and advertising media. Two-way ANOVA allows testing of differences in means for each factor and any interactions between factors.
This document provides an overview of analysis of variance (ANOVA). It introduces ANOVA and its key concepts, including its development by Ronald Fisher. It defines ANOVA and distinguishes between one-way and two-way ANOVA. It outlines the assumptions, techniques, and examples of how to perform one-way and two-way ANOVA. It also discusses the uses, advantages, and limitations of ANOVA for analyzing differences between multiple means and factors.
This document provides an overview of experimental design and analysis of variance (ANOVA). It defines key terms like independent and dependent variables, experimental units, treatments, and blocks. It explains different types of experimental designs like completely randomized designs, randomized block designs, and factorial experiments. It also covers ANOVA computations and assumptions for one-way and randomized block ANOVA models. Multiple comparison procedures like Tukey's HSD are introduced to identify differences between specific treatment means. Examples are provided to demonstrate applications of one-way and randomized block ANOVA.
This document provides an overview of analysis of variance (ANOVA) techniques. It discusses one-way ANOVA, which evaluates differences between three or more population means. Key aspects covered include partitioning total variation into between- and within-group components, assumptions of normality and equal variances, and using the F-test to test for differences. Randomized block ANOVA and two-factor ANOVA are also introduced as extensions to control for additional variables. Post-hoc tests like Tukey and Fisher's LSD are described for determining specific mean differences.
This document discusses two-way analysis of variance (ANOVA), which analyzes the relationship between two categorical independent variables and a continuous dependent variable. It provides an example using IQ scores categorized by sex and blood lead level. Two-way ANOVA tests for an interaction effect between the factors and also tests whether each factor individually has an effect. In this example, there is no significant interaction effect or individual effects of sex or blood lead level on IQ scores.
This lab report summarizes an analysis of normality for various variables in a dataset. Histograms and normal quantile plots were used to visually assess whether age, weight, height, blood pressure, salary, and charity amounts followed a normal distribution. Variables that were non-normal were transformed using mathematical functions to achieve a normal distribution. The transformed variables and BMI, which was normally distributed, were then analyzed. A random normal distribution was simulated and compared to the true BMI distribution, finding the probabilities were close.
The chi-square test is used to determine if an observed frequency distribution differs from an expected theoretical distribution. It can test goodness of fit, independence of attributes, and homogeneity. The test involves calculating chi-square by taking the sum of the squares of the differences between observed and expected frequencies divided by expected frequencies. For the test to be valid, certain conditions must be met regarding sample size, expected frequencies, independence, and randomness. The test has some limitations such as not measuring strength of association and being unreliable with small expected frequencies.
The Student's t-test is used to compare the means of two samples and determine if they are statistically different. In an example, biomass measurements were taken from four replicates each of bacterium A and bacterium B. The mean, variance, and t-value were calculated and found to exceed the critical t-value at a significance level of 5%, indicating the means were significantly different. Therefore, the null hypothesis that the samples do not differ can be rejected.
Development of a Spatial Path-Analysis Method for Spatial Data AnalysisIJECEIAES
ย
Path analysis is a method for identifying and analyzing direct and indirect relationship be- tween independent and dependent variables. This method was developed by Sewal Wright and initially only used correlation analysis results in identifying the variablesโ relationship. So far, path analysis has been mostly used to deal with variables of non-spatial data type. When analyzing variables that have elements of spatial dependency, path analysis could result in a less precise model. Therefore, it is necessary to build a path analysis model that is able to identify and take into account the effects of spatial dependencies. Spatial autocorrelation and spatial regression methods can be used to enhance path analysis to identify the effects of spatial dependencies. This paper proposes a method derived from path analysis that can process data with spatial elements and furthermore can be used to identify and analyze the spatial effects on the data; we call this method spatial path analysis.
This document provides an overview of analysis of variance (ANOVA). It discusses two-way ANOVA and the design of experiments (DOE) including completely randomized design (CRD) and randomized block design (RBD). CRD is the simplest design where treatments are randomly allocated without blocking. RBD uses blocking to reduce experimental error by making comparisons only between treatments within the same block. The document provides formulas and examples for calculating ANOVA tables for one-way and two-way ANOVA to test for differences between sample means.
This chapter discusses analysis of variance (ANOVA) techniques. It covers one-way and two-way ANOVA designs, and how to perform and interpret the results of one-way and two-way ANOVA tests. It also discusses how to partition total variation into within-group and between-group components, calculate mean squares, obtain F-statistics, and make inferences about differences in group means. Multiple comparison procedures for identifying which specific group means differ are also introduced.
This document discusses strategies for designing factorial experiments with multiple factors. It explains that factorial experiments involve studying the effect of varying levels of factors on a response variable. The optimal design strategy depends on whether the circumstances are unusual or normal. For normal circumstances where there is some noise and factors influence each other, a fractional factorial or full factorial design is typically best. The document provides details on analyzing the data from factorial experiments to determine if factor effects and interactions are significant. It includes examples of calculating main effects and interactions from 2-level factorial data.
This document provides information on performing a one-way analysis of variance (ANOVA). It discusses the F-distribution, key terms used in ANOVA like factors and treatments, and how to calculate and interpret an ANOVA test statistic. An example demonstrates how to conduct a one-way ANOVA to determine if three golf clubs produce different average driving distances.
The document discusses analysis of variance (ANOVA), a statistical technique developed by R.A. Fisher in 1920 to analyze the differences between group means and their associated procedures. It can be used when there are two or more samples to study the significance of differences between their mean values. ANOVA works by decomposing the overall variability into different sources and comparing the relative sizes of different variances. It is useful for research in fields like agriculture, biology, pharmacy, and more.
This document discusses metrics for assessing the predictability and efficiency of covariate-adaptive randomization designs in clinical trials. It proposes measuring predictability using a modified Blackwell-Hodges potential selection bias metric that calculates how well an observer could guess the next treatment assignment. It also considers entropy and periodicity measures. Balance/efficiency is proposed to be measured using Atkinson's method of quantifying the loss of statistical power as an equivalent reduction in sample size due to treatment imbalances within subgroups. The document then outlines a simulation study to compare various randomization methods using these proposed metrics.
Efficiency of ratio and regression estimators using double samplingAlexander Decker
ย
This document discusses different sampling methods for estimating population parameters, including ratio estimation, regression estimation, and simple random sampling without replacement. It compares the efficiency of using double sampling for ratio estimation versus double sampling for regression estimation versus simple random sampling without replacement. The key findings are that double sampling for regression estimation is more efficient than double sampling for ratio estimation or simple random sampling without replacement if the regression line does not pass through the origin. Relative efficiency and coefficient of variation are used to empirically compare the different estimators.
Statistics is the science of dealing with numbers and data. It involves collecting, summarizing, presenting, and analyzing data. There are four main steps: data collection, summarization by removing unwanted data and classifying/tabulating, presentation with diagrams/graphs/tables, and analysis using measures like average, dispersion, and correlation. Descriptive statistics summarize and describe data, while inferential statistics allow generalizing from samples to populations. Common descriptive statistics include measures of central tendency (mean, median, mode), variability (range, variance, standard deviation), and distribution properties. Inferential statistics techniques like hypothesis testing and ANOVA are used to make inferences about populations based on samples.
Austin Statistics is an open access, peer reviewed, scholarly journal dedicated to publish articles in all areas of statistics.
The aim of the journal is to provide a forum for scientists, academicians and researchers to find most recent advances in the field statistics.
Austin Statistics accepts original research articles, review articles, case reports and rapid communication on all the aspects of statistics.
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
ย
1. The document proposes a class of estimators for estimating the population mean in two-phase sampling using auxiliary information.
2. Some common estimators like the ratio, product, and regression estimators are special cases within the proposed class. Expressions for bias and mean squared error of the estimators are obtained up to the first order of approximation.
3. Asymptotically optimum estimators are identified that have minimum mean squared error. The proposed class of estimators is found to perform better than usual ratio and other estimators for population mean estimation.
A Study of Some Tests of Uniformity and Their PerformancesIOSRJM
ย
This document discusses a study that evaluates the performance of eleven tests for uniformity. The tests include Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, Watson, Sukhatme, probability product, Kuiper, Gini, ZhangA, ZhangC tests. Through simulation, the power of these tests is assessed under various sample sizes and five alternative distributions. The results are displayed in tables and graphs. The document concludes that the performance of the tests depends on the sample size and nature of the alternative distribution.
This document discusses sampling distributions and related statistical concepts. It defines descriptive and inferential statistics, and explains that inferential statistics uses samples to draw conclusions about populations. Key concepts covered include sampling, probability distributions, sampling distributions, and the central limit theorem. The sampling distribution of the sample mean is examined in depth. For a sample mean, the expected value is equal to the population mean, while the standard error depends on factors like the population standard deviation and sample size. Examples are provided to illustrate these statistical properties.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
This document summarizes a research article about using particle swarm optimization to find different shrinkage parameters (k values) for each explanatory variable in ridge regression, rather than a single k value. Ridge regression is used to address multicollinearity issues in multiple regression analysis. Typically, ridge regression estimates a single k value, but this study uses an algorithm based on particle swarm optimization to estimate different k values for each variable. The study applies this new method to real data and simulations to evaluate its performance compared to other ridge regression methods.
The document presents a study that compares five methods for estimating missing values in building sensor data: linear regression, weighted k-nearest neighbors, support vector machines, mean imputation, and replacing missing values with zero. The methods were evaluated using data from sensors in an office building in Japan, with the amount of missing data varied from 5% to 20%. Feature selection and inclusion of lagged variables as predictors were also examined to determine their effect on the methods' performance.
This document discusses various types of analysis of variance (ANOVA) statistical tests. It begins with an introduction to one-way ANOVA for comparing the means of three or more independent groups. Requirements for one-way ANOVA include a nominal independent variable with three or more levels and a continuous dependent variable. Assumptions of one-way ANOVA include normality and homogeneity of variances. The document then briefly discusses two-way ANOVA, MANOVA, ANOVA with repeated measures, and related statistical tests. Examples of each type of ANOVA are provided.
1) Non-parametric tests make fewer assumptions than parametric tests about the population distribution. They do not require the assumptions of normality and equal variances.
2) Some common non-parametric tests described in the document include the Mann-Whitney U test for comparing two independent samples, the Wilcoxon Rank Sum test for comparing two independent samples, and the Wilcoxon Signed Rank test for comparing two related samples.
3) The Kruskal-Wallis H test is also described, which is the non-parametric equivalent of the one-way ANOVA and can be used to compare three or more independent samples.
The document provides an overview of analysis of variance (ANOVA). It defines ANOVA and discusses its key concepts, including how it was developed by Ronald Fisher. It also covers one-way and two-way ANOVA, describing their techniques and providing examples. The uses, advantages and limitations of ANOVA are outlined.
Parameter Optimisation for Automated Feature Point DetectionDario Panada
ย
Parameter optimization for an automated feature point detection model was explored. Increasing the number of random displacements up to 20 improved performance but additional increases did not. Larger patch sizes consistently improved performance. Increasing the number of decision trees did not affect performance for this single-stage model, unlike previous findings for a two-stage model. Overall, some parameter tuning was found to enhance the model's accuracy but not all parameters significantly impacted results.
- Analysis of variance (ANOVA) is a statistical technique used to determine if the means of different groups are significantly different from each other.
- ANOVA separates the total variation in a data set into component parts associated with different sources of variation to test their statistical significance.
- The document provides definitions of ANOVA, assumptions of ANOVA, techniques for one-way and two-way ANOVA including calculation of sum of squares, variance, and the F-ratio to test for significance of differences between means.
- An example illustrates a one-way ANOVA calculation to test for differences in crop yields between four varieties.
This document provides an overview of statistics and statistical tests. It defines descriptive statistics as concerned with data collection, presentation and interpretation, while inferential statistics involves drawing conclusions from statistical analysis. Parametric tests can be applied to normally distributed interval/ratio data, while non-parametric tests do not require normality assumptions. Examples of parametric and non-parametric tests are provided, along with guidelines for applying a two-sample t-test to compare means between two independent groups. Two examples of applying a t-test are given to test differences between groups.
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATAorajjournal
ย
All observations donโt have equal significance in regression analysis. Diagnostics of observations is an important aspect of model building. In this paper, we use diagnostics method to detect residuals and influential points in nonlinear regression for repeated measurement data. Cook distance and Gauss newton method have been proposed to identify the outliers in nonlinear regression analysis and parameter estimation. Most of these techniques based on graphical representations of residuals, hat matrix and case deletion measures. The results
show us detection of single and multiple outliers cases in repeated measurement data. We use these techniques
to explore performance of residuals and influence in nonlinear regression model.
This document describes different types of statistical tests used for hypothesis testing:
Type I - V describe tests for differences between sample and population proportions and means using z-tests. Type VI describes a z-test for differences between two sample standard deviations. Small sample tests using Student's t-distribution are also described for types I-III. An F-test is used to test for equality of variances between populations. Chi-square tests are used for goodness of fit, independence of attributes, and homogeneity. The steps involved in hypothesis testing are outlined.
This document summarizes a research paper that examines pricing strategy in a two-stage supply chain consisting of a supplier and retailer. The supplier offers a credit period to the retailer, who then offers credit to customers. A mathematical model is formulated to maximize total profit for the integrated supply chain system. The model considers three cases based on the relative lengths of the credit periods offered at each stage. Equations are developed to represent the profit functions for the supplier, retailer and overall system in each case. The goal is to determine the optimal selling price that maximizes total integrated profit.
The document discusses melanoma skin cancer detection using a computer-aided diagnosis system based on dermoscopic images. It begins with an introduction to skin cancer and melanoma. It then reviews existing literature on automated melanoma detection systems that use techniques like image preprocessing, segmentation, feature extraction and classification. Features extracted in other studies include asymmetry, border irregularity, color, diameter and texture-based features. The proposed system collects dermoscopic images and performs preprocessing, segmentation, extracts 9 features based on the ABCD rule, and classifies images using a neural network classifier to detect melanoma. It aims to develop an automated diagnosis system to eliminate invasive biopsy procedures.
This document summarizes various techniques for image segmentation that have been studied and proposed in previous research. It discusses edge-based, threshold-based, region-based, clustering-based, and other common segmentation methods. It also reviews applications of segmentation in medical imaging, plant disease detection, and other fields. While no single technique can segment all images perfectly, hybrid and adaptive methods combining multiple approaches may provide better results. Overall, image segmentation remains an important but challenging task in digital image processing and computer vision.
This document summarizes a research paper that proposes a portable device called the "Disha Device" to improve women's safety. The device has features like live location tracking, audio/video recording, automatic messaging to emergency contacts, a buzzer, flashlight, and pepper spray. It is designed using an Arduino microcontroller connected to GPS and GSM modules. When the button is pressed, it sends an alert message with the woman's location, sets off an alarm, activates the flashlight and pepper spray for self-defense. The goal is to provide women a compact, one-click safety system to help them escape dangerous situations or call for help with just a single press of a button.
- The document describes a study that constructed physical fitness norms for female students attending social welfare schools in Andhra Pradesh, India.
- Researchers tested 339 students in classes 6-10 on speed, strength, agility and flexibility tests. Tests included 50m run, bend and reach, medicine ball throw, broad jump, shuttle run, and vertical jump.
- The results showed that 9th class students had the best average time for the 50m run. 10th class students had the highest flexibility on average. Strength and performance generally improved with increased class level.
This document summarizes research on downdraft gasification of biomass. It discusses how downdraft gasifiers effectively convert solid biomass into a combustible producer gas. The gasification process involves pyrolysis and reactions between hot char and gases that produce CO, H2, and CH4. Downdraft gasifiers are well-suited for biomass gasification due to their simple design and ability to manage the gasification process with low tar production. The document also reviews previous studies on gasifier configuration upgrades and their impact on performance, and the principles of downdraft gasifier operation.
This document summarizes the design and manufacturing of a twin spindle drilling attachment. Key points:
- The attachment allows a drilling machine to simultaneously drill two holes in a single setting, improving productivity over a single spindle setup.
- It uses a sun and planet gear arrangement to transmit power from the main spindle to two drilling spindles.
- Components like gears, shafts, and housing were designed using Creo software and manufactured. Drill chucks, bearings, and bits were purchased.
- The attachment was assembled and installed on a vertical drilling machine. It is aimed at improving productivity in mass production applications by combining two drilling operations into one setup.
The document presents a comparative study of different gantry girder profiles for various crane capacities and gantry spans. Bending moments, shear forces, and section properties are calculated and tabulated for 'I'-section with top and bottom plates, symmetrical plate girder, 'I'-section with 'C'-section top flange, plate girder with rolled 'C'-section top flange, and unsymmetrical plate girder sections. Graphs of steel weight required per meter length are presented. The 'I'-section with 'C'-section top flange profile is found to be optimized for biaxial bending but rolled sections may not be available for all spans.
This document summarizes research on analyzing the first ply failure of laminated composite skew plates under concentrated load using finite element analysis. It first describes how a finite element model was developed using shell elements to analyze skew plates of varying skew angles, laminations, and boundary conditions. Three failure criteria (maximum stress, maximum strain, Tsai-Wu) were used to evaluate first ply failure loads. The minimum load from the criteria was taken as the governing failure load. The research aims to determine the effects of various parameters on first ply failure loads and validate the numerical approach through benchmark problems.
This document summarizes a study that investigated the larvicidal effects of Aegle marmelos (bael tree) leaf extracts on Aedes aegypti mosquitoes. Specifically, it assessed the efficacy of methanol extracts from A. marmelos leaves in killing A. aegypti larvae (at the third instar stage) and altering their midgut proteins. The study found that the leaf extract achieved 50% larval mortality (LC50) at a concentration of 49 ppm. Proteomic analysis of larval midguts revealed changes in protein expression levels after exposure to the extract, suggesting its bioactive compounds can disrupt the midgut. The aim is to identify specific inhibitor proteins in the midg
This document presents a system for classifying electrocardiogram (ECG) signals using a convolutional neural network (CNN). The system first preprocesses raw ECG data by removing noise and segmenting the signals. It then uses a CNN to extract features directly from the ECG data and classify arrhythmias without requiring complex feature engineering. The CNN architecture contains 11 convolutional layers and is optimized using techniques like batch normalization and dropout. The system was tested on ECG datasets and achieved classification accuracy of over 93%, demonstrating its effectiveness at automated ECG classification.
This document presents a new algorithm for extracting and summarizing news from online newspapers. The algorithm first extracts news related to the topic using keyword matching. It then distinguishes different types of news about the same topic. A term frequency-based summarization method is used to generate summaries. Sentences are scored based on term frequency and the highest scoring sentences are selected for the summary. The algorithm was evaluated on news datasets from various newspapers and showed good performance in intrinsic evaluation metrics like precision, recall and F-score. Thus, the proposed method can effectively extract and summarize online news for a given keyword or topic.
Data Communication and Computer Networks Management System Project Report.pdfKamal Acharya
ย
Networking is a telecommunications network that allows computers to exchange data. In
computer networks, networked computing devices pass data to each other along data
connections. Data is transferred in the form of packets. The connections between nodes are
established using either cable media or wireless media.
Better Builder Magazine brings together premium product manufactures and leading builders to create better differentiated homes and buildings that use less energy, save water and reduce our impact on the environment. The magazine is published four times a year.
This is an overview of my current metallic design and engineering knowledge base built up over my professional career and two MSc degrees : - MSc in Advanced Manufacturing Technology University of Portsmouth graduated 1st May 1998, and MSc in Aircraft Engineering Cranfield University graduated 8th June 2007.
We have designed & manufacture the Lubi Valves LBF series type of Butterfly Valves for General Utility Water applications as well as for HVAC applications.
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...DharmaBanothu
ย
The Network on Chip (NoC) has emerged as an effective
solution for intercommunication infrastructure within System on
Chip (SoC) designs, overcoming the limitations of traditional
methods that face significant bottlenecks. However, the complexity
of NoC design presents numerous challenges related to
performance metrics such as scalability, latency, power
consumption, and signal integrity. This project addresses the
issues within the router's memory unit and proposes an enhanced
memory structure. To achieve efficient data transfer, FIFO buffers
are implemented in distributed RAM and virtual channels for
FPGA-based NoC. The project introduces advanced FIFO-based
memory units within the NoC router, assessing their performance
in a Bi-directional NoC (Bi-NoC) configuration. The primary
objective is to reduce the router's workload while enhancing the
FIFO internal structure. To further improve data transfer speed,
a Bi-NoC with a self-configurable intercommunication channel is
suggested. Simulation and synthesis results demonstrate
guaranteed throughput, predictable latency, and equitable
network access, showing significant improvement over previous
designs
Online train ticket booking system project.pdfKamal Acharya
ย
Rail transport is one of the important modes of transport in India. Now a days we
see that there are railways that are present for the long as well as short distance
travelling which makes the life of the people easier. When compared to other
means of transport, a railway is the cheapest means of transport. The maintenance
of the railway database also plays a major role in the smooth running of this
system. The Online Train Ticket Management System will help in reserving the
tickets of the railways to travel from a particular source to the destination.
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...Dr.Costas Sachpazis
ย
Consolidation Settlement Calculation Program-The Python Code
By Professor Dr. Costas Sachpazis, Civil Engineer & Geologist
This program calculates the consolidation settlement for a foundation based on soil layer properties and foundation data. It allows users to input multiple soil layers and foundation characteristics to determine the total settlement.
๐ฅLiploCk Call Girls Pune ๐ฏCall Us ๐ 7014168258 ๐๐Independent Pune Escorts Ser...
ย
91202104
1. International Journal of Research in Advent Technology, Vol.9, No.1, January 2021
E-ISSN: 2321-9637
Available online at www.ijrat.org
1
doi: 10.32622/ijrat.91202104
Tests for a Single Upper Outlier in a Johnson SB Sample with
Unknown Parameters
Tanuja Sriwastava, Mukti Kant Sukla
Abstract: Outliers are unexpected observations, which
deviate from the majority of observations. Outlier
detection and prediction are challenging tasks, because
outliers are rare by definition. A test statistic for single
upper outlier is proposed and applied to Johnson SB
sample with unknown parameters. The Johnson SB
distribution have four parameters and is extremely
flexible, which means that it can fit a wide range of
distribution shapes. Because of its distributional shapes it
has a variety of applications in many fields. The test
statistic proposed for the case when parameters are
known (Sriwastava, T., 2018) used here for developing
the test statistic when parameters are unknown. Critical
points were calculated for different sample sizes and for
different level of significance. The performance of the
test in the presence of a single upper outlier is
investigated. One numerical example was given for
highlighting the result.
Keywords: Outlier, Johnson SB Distribution, Trimmed
Sample, Critical Points, Simulation.
I. INTRODUCTION
The p.d.f of Johnson SB distribution with location
parameter ๐, scale parameter ๐ and two shape parameters
๐พ ๐๐๐ ๐ฟ is given by
๐(๐ฅ; ๐, ๐, ๐พ , ๐ฟ) =
๐ฟ
โ2๐
๐
(๐โ(๐ฅโ๐))(๐ฅโ๐)
๐๐ฅ๐ [โ
1
2
{๐พ +
๐ฟ ln (
๐ฅโ๐
๐โ(๐ฅโ๐)
)}
2
], (1)
๐ โค ๐ฅ โค ๐ + ๐, ๐ฟ > 0, โโ < ๐พ < โ, ๐ > 0, โ โ < ๐ <
โ.
This distribution is extremely flexible, which means that
it can fit a wide range of distribution shapes. Because of
its distributional shapes it has a variety of application in
many fields like human exposure (Flynn, 2004), forestry
data (Zang et. al, 2003), rainfall data (Kotteguda, 1987)
and manay more because of its flexible nature. One of the
most important application of this distribution is in the
complex data set like microarray data analysis (Florence
George, 2007) etc.
The detection of outliers in such situations are
considerably important. A number of test statistic have
been proposed for several distributions when their
parameters are known (Barnett & Lewis, 1994).
A test statistic has been proposed for the Johnson SB
distribution when parameters are known by (Sriwastava,
T., 2018) in one of her research paper. But in practice, no
parameter would be known. In such situations, estimates
of all the parameters are used in the construction of a test
statistic for detection of outliers in a sample from a
Johnson SB distribution. Then its critical values and the
corresponding performance study were done by
simulation technique. Since the study is about outlying
observation, the entire sample should not be considered
for the estimation of all the parameters. Hence, the
estimates given by George and Ramachandran (2011) are
used as estimates of all the parameters, using a trimmed
sample obtained after removing the suspected outlying
observation(s).
The estimates of the parameters of this
distribution as given by George & Ramachandran (2011)
using maximum likelihood least square method was
considered are as follows.
๐พ
ฬ = โ
๐ฟ โ ๐(
๐ฅ๐โ๐
๐
)
๐โ1
๐=1
๐โ1
= โ๐ฟ๐ฬ . (2)
๐ฟ
ฬ2
=
๐โ1
โ [๐(
๐ฅ๐โ๐
๐
)]
2
๐โ1
๐=1 โ
1
๐โ1
[โ ๐(
๐ฅ๐โ๐
๐
)
๐โ1
๐=1 ]
2 =
1
๐ฃ๐๐(๐)
.
(3)
where, ๐ (
๐ฅโ๐
๐
) = log (
๐ฅโ๐
๐โ(๐ฅโ๐)
), ๐ฬ is the mean and
๐ฃ๐๐(๐) is the variance of the values of g defined here.
The estimates of ๐ and ๐ were as follows;
๐
ฬ =
(๐โ1) โ ๐ฅ๐
๐โ1
๐=1 ๐โ1(
๐ง๐โ๐พ
๐ฟ
)โโ ๐โ1(
๐ง๐โ๐พ
๐ฟ
)
๐โ1
๐=1 โ ๐ฅ๐
๐โ1
๐=1
(๐โ1) โ [๐โ1(
๐ง๐โ๐พ
๐ฟ
)]
2
๐โ1
๐=1 โ[โ ๐โ1(
๐ง๐โ๐พ
๐ฟ
)
๐โ1
๐=1 ]
2 .(4)
๐
ฬ = ๐ฅฬ โ ๐โ
๐๐๐๐ [๐โ1
(
๐งโ๐พ
๐ฟ
)] , (5)
where z=
๐ฅโ๐
๐
is a standard normal variate. Hence, the
quantiles of x and the corresponding quantiles of z can be
considered as paired observations. When there were 100
or more x values, the percentiles 1 through 99 were
considered, while for k number of data points of x, where
k is less than 100, ๐ โ 1 quantiles of x were considered.
Manuscript revised on January 29, 2021 and published on February
10, 2021
Dr. Tanuja Sriwastava, Assistant Professor, Department of Statistics, Sri
Venkateswara College, University of Delhi. Email ID:
tanujastat24@gmail.com,
Dr. Mukti Kanta Sukla, Associate Professor, Department of Statistics,
Sri Venkateswara College, University of Delhi. Email ID:
suklamk@gmail.com
2. International Journal of Research in Advent Technology, Vol.9, No.1, January 2021
E-ISSN: 2321-9637
Available online at www.ijrat.org
2
doi: 10.32622/ijrat.91202104
These ๐ โ 1 quantiles of x and the corresponding ๐ โ 1
quantiles of z were considered as paired observations.
II.PROPOSED OUTLIER DETECTION TEST
STATISTICS
Let ๐1, ๐2, โฏ , ๐๐ be a random sample from a Johnson SB
distribution with ๐พ and ๐ฟ as shape parameters, ๐ a scale
parameter and ๐ a location parameter and
๐(1), ๐(2), โฏ , ๐(๐) be the corresponding order statistics of
the n observations.
In the paper of (Sriwastava, T. 2018), a test statistic was
proposed for detection of an upper outlier for a sample
from Johnson SB distribution, when all parameters were
assumed to be known. But in all these, if all the
parameters of Johnson SB distribution were assumed to be
unknown, then we propose to use the MLE-least square
estimators of George and Ramachandran (2011) in place
of all the parameters in the above said test statistics.
The test statistic for the case of an upper outlier
(unknown parameters) obtained is given as
๐โฒ
={(
๐(๐)โ๐
ฬ
๐(๐โ1)โ๐
ฬ) (
๐
ฬโ(๐(๐โ1)โ๐
ฬ)
๐
ฬโ(๐(๐)โ๐
ฬ)
)}
๐ฟ
ฬ
, (6)
where ๐(๐) is the kth
order statistic, ๐ = 1,2, โฏ , ๐.
2.1 Critical values of the test statistic
To detect an upper outlying observation in a sample from
a Johnson SB distribution, the test statistic ๐โฒ
was used.
This test statistic should reject the null hypothesis for
larger value of ๐โฒ
. Critical values of the test statistic ๐โฒ
were obtained using simulation technique with 10,000
replications for different sample sizes. For this, a random
sample was generated from a Johnson SB distribution and
the estimates of all the parameters were calculated using
equation 2, 3, 4, 5 with this sample. Then the value of the
test statistic was calculated and the whole process was
replicated 10,000 times and the percentile value at 90, 95
and 99 percent was calculated, which is the critical values
at 10, 5 and 1 percent levels of significance respectively.
The critical value table for sample sizes n=3,
10(10)40(20)100, 200, 500, 1000 and at 10, 5 and 1
percent levels of significance are shown in table 1.
Table 1. Critical values ๐
๐ผ of the statistic ๐โฒ
n
100 ฮฑ% Level
10% 5% 1%
3 3.90214 4.45028 13.7898
10 3.22234 4.30638 7.82518
20 2.79391 3.66518 6.0721
30 2.54645 3.2718 5.2587
40 2.41443 2.99493 4.93746
60 2.35604 2.90261 4.75126
80 2.30658 2.85917 4.48976
100 2.26234 2.82311 4.30515
200 2.12077 2.59739 3.83966
500 1.96466 2.37174 3.52657
1000 1.91691 2.27917 3.39466
It can be seen from the table that the values are decreases
with increase in sample size and on comparison of these
critical values with that of the one with known
parameters, it can be seen that the values are very close to
each other for sample sizes 10 onwards.
2.2 Numerical Example
For highlighting the utility of the statistic, the following
20 observations were taken from the Census data of India
taken in the year 2011, (in, 000).
1028610, 1045547, 1062388, 1095722, 1112186,
1128521, 1160813, 1176742, 1192506, 1223581,
1238887, 1254019, 1283600, 1298041, 1312240,
1339741, 1352695, 1365302, 1388994, 1399838.
Using the above data for the case when the outlying
observation is from another sample with a shift in all the
parameters. The critical value at 5% level of significance
for considered below is 3.5966. The parameters were
estimated using a sample obtained by leaving out the
largest observation (as that is the suspected outlying
observation). Then the value of test statistic ๐โฒ
was
calculated with the estimated values of the parameters
and was found to be 5.99957. On comparing this with the
critical value at 5% level of significance for sample size
20, the null hypothesis gets rejected i.e. the largest
observation of the sample is confirmed as outlying.
III. PERFORMANCE STUDY
The performance study was done using simulation
technique for detection of an upper outlier. A random
sample of size n was generated using R software from a
Johnson SB distribution with location parameter ๐(=10),
scale parameter ๐(=30), with two shape parameters ๐พ(=1)
and ๐ฟ(=2) (known). Then a contaminant observation was
introduced in the sample.
For introducing a contaminant observation, another
sample of Johnson SB distribution with a shift (๐๐),
where 0 < a < 1 in the location parameter was generated.
The largest observation of the original sample was
replaced with the largest observation of the second
sample. Since the estimates of the parameters depend
upon only n-1 largest observations, the largest
observation of the sample need not be removed for
estimation purpose. Thus the values of the estimates of all
the parameters were calculated using MLE least square
3. International Journal of Research in Advent Technology, Vol.9, No.1, January 2021
E-ISSN: 2321-9637
Available online at www.ijrat.org
3
doi: 10.32622/ijrat.91202104
method. Using these estimated values of the parameters,
the value of the test statistic ๐๐
โฒ
was calculated at 5%
level of significance. Here, the process was simulated
10,000 times and the number of times the test statistic
falling in the critical region was noted. Simulation study
was carried out for different sample sizes. The
probabilities of rejection of null hypothesis for sample
sizes n = 10(10)30, 60, 100, 200, 500, 1000 and for
different values of โaโ are shown in the table 2.
Table 2: Probabilities of identification of the upper
contaminant observation.
a
n
0.033 0.067 0.1 0.133
10 0.4056 0.5249 0.6475 0.7569
20 0.3058 0.4362 0.5843 0.7315
30 0.3242 0.4845 0.6557 0.8001
60 0.2379 0.4095 0.6038 0.7856
100 0.2055 0.3971 0.6323 0.8253
200 0.2899 0.4253 0.7987 0.8541
500 0.3643 0.6603 0.8878 0.9821
1000 0.2401 0.5449 0.8283 0.963
a
n
0.2 0.27 0.3
10 0.9056 0.9753 0.9898
20 0.9166 0.9847 0.9942
30 0.9583 0.9948 0.9985
60 0.9669 0.9964 0.9987
100 0.9781 0.9994 0.9995
200 0.9887 1 1
500 0.9868 1 1
1000 0.9994 1 1
It can be observed from this table that, the test statistic
performs well for higher values of the shift i.e. for a>0.1.
As the values of shift increases, performance of the test
statistic also increases. On comparison with that of the
known parameter case, it can be seen that in this case,
performance is better for higher values of the shift,
whereas in the earlier case, performance was better for
lower values of the shift.
IV. CONCLUSION
It can be concluded from above study that the suggested
test statistic for an upper outlier is performing very well
for higher values of the shift. For the lower shift of values
performance of the test statistic was not satisfactory. The
higher values of shift and large sample sizes the test
statistic ๐โฒ
is best for one contaminant observation. The
proposed work for future research work and
implementation includes:
๏ท The generalization of proposed work to multiple
outlier case.
๏ท Apply the proposed outlier detection technique
to variety of application.
๏ท The method used here is may useful for
developing outlier detection test statistic for
complicated distribution like; Johnson SB.
REFERENCES
[1] Barnett V, Lewis T. Outliers in Statistical Data. John Wiley, 1994.
[2] Flynn MR. The 4 parameter lognormal (SB) model of human
exposure. Ann Occup Hyg. 2004; 48:617-22.
[3] George F. Johnsonโs system of distribution and Microarray data
analysis. Graduate Theses and Dissertations, University of South
Florida, 2007.
[4] George, F. and Ramachandran, K.M. (2011). Estimation of
parameters of Johnsonโs system of distributions. Journal of Modern
Applied Statistical Methods, 10, no. 2.
[5] Johnson NL. Systems of frequency curves generated by methods of
translation. Biometrica. 1949; 58:547-558.
[6] Kottegoda NT. Fitting Johnson SB curve by method of maximum
likelihood to annual maximum daily rainfalls. Water Resour Res.
1987; 23:728-732.
[7] Sriwastava T. An upper outlier detection procedure in a sample
from a Johnson SB distribution with known parameters.
International Journal of Applied Statistics and Mathematics. 2018;
3(2), 194-198.
[8] Zhang L, Packard PC, Liu C. A comparison of estimation methods
for fitting Weibull and Johnsonโs SB distributions to mixed spruce-
fir stands in northeastern North America. Can J Forest Res. 2003;
33:1340-1347.
AUTHORS PROFILE
Dr. Tanuja Sriwastava She has completed her M.Sc.
(Statistics) from Banaras Hindu University, Varanasi and
D.Phil in Statistics from University of Allahabad,
Prayagraj. She has Qualified UGC-NET(JRF) and has
published 5 international and national publications. Her
area of interest in research is mainly focused on
Distribution theory and Statistical Inference.
Dr. Mukti Kanta Sukla A committed senior
Associate Professor in the Department of Statistics
with over 25 years of experience in one of the leading
colleges of Delhi University, Sri Venkateswara
College. Focused on research and has 10 prior
international and national publications. Did his Ph.D
from Utkal University, the research focusing on
Stochastic Modelling based on the modelling of the rainfall data of
Mahanadi Delta Region. Has vast teaching experience on courses like
Linear Models, Econometrics, Algebra, Sample Survey Methods etc.
Possessing excellent administrative, verbal communication and
leadership skills along with constructive and effective methods that
promote a stimulating learning environment.