Tips for Presenting Statistical Data

Statistician Zone

tips for presenting statistical data

Welcome to our comprehensive guide on mastering the art of presenting statistical data. In this post, we'll explore essential tips that can transform your data presentation skills. We understand that statistics can be overwhelming, and presenting them in an engaging, understandable way can be challenging. But don't worry, we've got you covered. Whether you're a student, a researcher, or a business professional, this guide will help you present statistical data effectively.

Understanding Your Audience

Knowing your audience is the first step in presenting statistical data effectively. You need to understand their background, their level of knowledge about the topic, and what they expect from your presentation. This understanding will guide you in choosing the right statistical data and the best way to present it.

For instance, if your audience is not familiar with statistical jargon, you should avoid using complex terms and focus on presenting the data in a simple, understandable way. Use visuals to illustrate your points and explain the significance of the data in a language your audience can understand.

On the other hand, if your audience is well-versed in statistics, you can delve deeper into the data. You can use more complex graphs and charts, discuss the methodology used in data collection, and engage your audience in a more technical discussion. Remember, the goal is to communicate effectively with your audience, not to impress them with your statistical prowess.

Choosing the Right Visuals

Visuals play a crucial role in presenting statistical data. They help your audience understand the data quickly and easily. However, not all visuals are created equal. You need to choose the right visual based on the type of data you're presenting and the message you want to convey.

Bar charts, for instance, are great for comparing quantities across different categories. Line graphs, on the other hand, are ideal for showing trends over time. Pie charts can be used to show proportions of a whole, while scatter plots are perfect for showing relationships between two variables.

When creating visuals, keep them simple and uncluttered. Avoid using too many colors or unnecessary decorations that can distract your audience. Also, make sure to label your visuals clearly and provide a brief explanation of what they represent.

Using Clear and Concise Language

When presenting statistical data, it's important to use clear and concise language. Avoid using jargon or complex terms that your audience may not understand. Instead, explain the data in simple terms and focus on the key points you want your audience to remember.

For example, instead of saying "The data shows a statistically significant positive correlation between X and Y", you could say "As X increases, Y also tends to increase". This way, you're not only making the data easier to understand, but you're also highlighting the main takeaway for your audience.

Also, when discussing the results, avoid making absolute statements unless the data supports them. Instead, use phrases like "the data suggests" or "the results indicate" to show that you're interpreting the data, not stating facts.

Telling a Story with Your Data

One of the most effective ways to engage your audience and make your data memorable is by telling a story. Instead of just presenting the numbers, show your audience what those numbers mean. Connect the data to real-world situations or issues that your audience cares about.

For instance, if you're presenting data on climate change, you could start by showing the rising global temperatures over the years. Then, you could relate this data to the increasing frequency of wildfires or the melting of polar ice caps. By doing this, you're not just presenting data, you're telling a story that your audience can relate to and remember.

Remember, the goal of presenting statistical data is not just to inform, but also to persuade and inspire action. By telling a story with your data, you can achieve all these goals.

Practicing Your Presentation

Practice makes perfect, and this is especially true when it comes to presenting statistical data. Before your presentation, take the time to practice. This will help you become more familiar with the data and your visuals, and it will also help you anticipate any questions your audience might have.

When practicing, pay attention to your pacing. You don't want to rush through your presentation, but you also don't want to drag it out. Aim for a pace that allows your audience to absorb the information, but also keeps them engaged.

Also, practice your body language and tone of voice. These non-verbal cues can greatly affect how your audience perceives your presentation. Stand tall, make eye contact, and speak with confidence. Remember, you're not just presenting data, you're also selling an idea.

Handling Questions and Feedback

After your presentation, be prepared to handle questions and feedback from your audience. This is an opportunity for you to clarify any points that your audience may not have understood, and to further discuss the implications of your data.

When answering questions, be honest and straightforward. If you don't know the answer, admit it and offer to find out. Also, be open to feedback. Your audience's insights and perspectives can help you improve your future presentations.

Remember, presenting statistical data is not just about showing numbers. It's about communicating effectively, engaging your audience, and making your data meaningful and memorable.

Wrapping Up: Mastering Data Presentation

Presenting statistical data effectively is an art that requires understanding your audience, choosing the right visuals, using clear language, telling a story, practicing your presentation, and handling questions and feedback. By mastering these skills, you can transform your data presentations from dull and confusing to engaging and memorable. So, start applying these tips today and see the difference they can make in your data presentation skills.

Call Us Today! +91 99907 48956 | [email protected]

Data presentation - types & its importance, what is data presentation.

Data Analysis and Data Presentation have a practical implementation in every possible field. It can range from academic studies, commercial, industrial and marketing activities to professional practices.

In its raw form, data can be extremely complicated to decipher and in order to extract meaningful insights from the data, data analysis is an important step towards breaking down data into understandable charts or graphs.

Data analysis tools used for analyzing the raw data which must be processed further to support N number of applications.

Therefore, the processes or analyzing data usually helps in the interpretation of raw data and extract the useful content out of it. The transformed raw data assists in obtaining useful information.

Once the required information is obtained from the data, the next step would be to present the data in a graphical presentation.

The presentation is the key to success. Once the information is obtained the user transforms the data into a pictorial Presentation so as to be able to acquire a better response and outcome.

Methods of Data Presentation in Statistics

1. pictorial presentation.

It is the simplest form of data Presentation often used in schools or universities to provide a clearer picture to students, who are better able to capture the concepts effectively through a pictorial Presentation of simple data.

2. Column chart

It is a simplified version of the pictorial Presentation which involves the management of a larger amount of data being shared during the presentations and providing suitable clarity to the insights of the data.

3. Pie Charts

Pie charts provide a very descriptive & a 2D depiction of the data pertaining to comparisons or resemblance of data in two separate fields.

4. Bar charts

A bar chart that shows the accumulation of data with cuboid bars with different dimensions & lengths which are directly proportionate to the values they represent. The bars can be placed either vertically or horizontally depending on the data being represented.

5. Histograms

It is a perfect Presentation of the spread of numerical data. The main differentiation that separates data graphs and histograms are the gaps in the data graphs.

6. Box plots

Box plot or Box-plot is a way of representing groups of numerical data through quartiles. Data Presentation is easier with this style of graph dealing with the extraction of data to the minutes of difference.

Map Data graphs help you with data Presentation over an area to display the areas of concern. Map graphs are useful to make an exact depiction of data over a vast case scenario.

All these visual presentations share a common goal of creating meaningful insights and a platform to understand and manage the data in relation to the growth and expansion of one’s in-depth understanding of data & details to plan or execute future decisions or actions.

Importance of Data Presentation

Data Presentation could be both can be a deal maker or deal breaker based on the delivery of the content in the context of visual depiction.

Data Presentation tools are powerful communication tools that can simplify the data by making it easily understandable & readable at the same time while attracting & keeping the interest of its readers and effectively showcase large amounts of complex data in a simplified manner.

If the user can create an insightful presentation of the data in hand with the same sets of facts and figures, then the results promise to be impressive.

There have been situations where the user has had a great amount of data and vision for expansion but the presentation drowned his/her vision.

To impress the higher management and top brass of a firm, effective presentation of data is needed.

Data Presentation helps the clients or the audience to not spend time grasping the concept and the future alternatives of the business and to convince them to invest in the company & turn it profitable both for the investors & the company.

Although data presentation has a lot to offer, the following are some of the major reason behind the essence of an effective presentation:-

  • Many consumers or higher authorities are interested in the interpretation of data, not the raw data itself. Therefore, after the analysis of the data, users should represent the data with a visual aspect for better understanding and knowledge.
  • The user should not overwhelm the audience with a number of slides of the presentation and inject an ample amount of texts as pictures that will speak for themselves.
  • Data presentation often happens in a nutshell with each department showcasing their achievements towards company growth through a graph or a histogram.
  • Providing a brief description would help the user to attain attention in a small amount of time while informing the audience about the context of the presentation
  • The inclusion of pictures, charts, graphs and tables in the presentation help for better understanding the potential outcomes.
  • An effective presentation would allow the organization to determine the difference with the fellow organization and acknowledge its flaws. Comparison of data would assist them in decision making.

Recommended Courses

Data Visualization

Using powerbi &tableau, tableau for data analysis, mysql certification program, the powerbi masterclass, need help call our support team 7:00 am to 10:00 pm (ist) at (+91 999-074-8956 | 9650-308-956), keep in touch, email: [email protected].

WhatsApp us

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Surgical Infections logo

Basic Introduction to Statistics in Medicine, Part 1: Describing Data

Wyatt p bensken, fredric m pieracci, vanessa p ho.

  • Author information
  • Article notes
  • Copyright and License information

Address correspondence to: Dr. Vanessa P. Ho, Department of Surgery, Division of Trauma, Critical Care, Burn, and Acute Care Surgery, 2500 MetroHealth Drive, Cleveland, OH 44109, USA [email protected]

Issue date 2021 Aug 1.

Background: Standardized and concise data presentation forms the base for subsequent analysis and interpretation. This article reviews types of data, data properties and distributions, and both numerical and graphical methods of data presentation.

Methods: For the purposes of illustration, the National Inpatient Sample was queried to categorize patients as having either emergency general surgery or non-emergency general surgery admissions.

Results: Variables are categorized as either categorical or numerical. Within the former, there are ordinal and or nominal subtypes; within the latter, there are ratio and interval subtypes. Categorical data are typically displayed as number (%). Numerical data must be assessed for normality as normally distributed data behave in certain patterns that allow for specific statistical tests to be used. Several properties exist for numerical data, including measurements of central tendency (mean, median, and mode), as well as standard deviation, range, and interquartile range. The best initial assessment of the distribution of numerical data is graphical with both histograms and box plots.

Conclusion: Knowledge of the types, distribution, and properties of data is essential to move forward with hypothesis testing.

Keywords: data description, data science, statistics

Counting and measurement is the basis of all research and accurate representation of numeric data ensures that research is systematic and reproducible. After the design of a research study, the most critical juncture in a project is a complete and accurate description of the data and the methods used to obtain the results. Utilizing a systematic description of the data as a first step not only ensures transparent reporting of results, but helps the investigator identify potential problems in their analytic process or data sources to guide analytic decisions. Examining the distribution and structure of data ensures that the test and analyses chosen are the most appropriate and statistically valid. In addition to aiding the investigator, a clear description of the methods and data will aid in peer review and the study's utility in the broader research enterprise. Specifically, the description helps readers to understand external validity of a particular study, in other words, are findings generalizable to other populations? When drafting a manuscript, the description of data presentation and analysis should be standardized to the point where, after reading it, an independent party could reproduce your results exactly.

There are two cornerstones to an appropriate description of data: (1) a well-developed and presented table that describes your population, often referred to as a demographics table or Table 1 and (2) data visualization with appropriately chosen graphics. In this article, we provide examples of how to describe and visualize data using a nationally representative database, the Nationwide Inpatient Sample, to demonstrate a robust and thorough description of the methods and data used, while also highlighting specific pitfalls. We also demonstrate how weighted databases may add an extra layer of complexity to describing your study population. It is our goal that this work provides a road map for investigators seeking to utilize best practices in describing and presenting their data.

Types of Quantitative Data

To demonstrate these data science statistical practices and pitfalls, we used data from the 2017 Nationwide Inpatient Sample (NIS) from the Healthcare Cost and Utilization Project (HCUP). The NIS is an approximately 20% sample of all-payer hospitalizations that are included as part of HCUP that are then weighted to provide national estimates. This weighting means that each observed hospitalization in the sample represents a specific number of hospitalizations in the population. With this, the sample of 7.1 million hospitalizations represents more than 35.7 million hospitalizations. It includes parameters covering patient demographics (race, gender, age, payer, etc.), admission and discharge status, diagnoses, procedures, length of stay (LOS), and cost. All data are at the discharge-level and the NIS does not provide patient identifiers to be able to link hospitalizations. In this study we identified patients who underwent emergency general surgery (EGS) in 2017. Here, EGS is defined as appendectomy, colectomy and colostomy, laparotomy, laparoscopy, lysis of adhesions, small bowel resection, ulcer repair, and gallbladder procedures, as previously described by Smith et al. [ 1 ]. Specifically, we required that the hospitalization contain both a diagnosis and procedure code for EGS.

Of note, NIS data are structured to be able to perform a weighted adjustment to establish a nationally representative sample. For this article, however, the only weighted analysis we present is for the overall number of EGS procedures. This weighting followed guidelines from the Agency for Healthcare Research and Quality (AHRQ) using the given weights, cluster, and strata. Because of this weighting, the national estimates are presented with standard errors. Data cleaning was done via SAS, version 9 (SAS Institute, Cary, NC) with visualizations made in R version 3.6.1 using the tidyverse and patchwork packages [ 2 , 3 ]. Sample data available online were also used to build the skewed distributions in Figure 1 [ 4 ].

FIG. 1.

Example of normal and skewed distributions, using simulated data.

Using these data, we demonstrate how to construct a demographics table or Table 1 while also showing the value of graphical visualization of data to illustrate the distribution of age and LOS. The 2017 NIS contained 7,159,694 admissions that, when weighted, represent a national estimate of 35,798,453 hospitalizations. There was a total of 11,034 (1.6%) hospitalizations for emergency general surgery (EGS), representing an estimated 555,170 ± 5,969 (1.6% ± 0.01) nationally in 2017.

Data Cleaning and Categorization for Analysis

Data collection is typically organized via a data table, spreadsheet, or data frame. These datasets are typically organized such that each row of data represents one observation or unit to be studied (such as a single patient, one admission, or a hospital) and each column of data is a collected parameter (such as age or sex). Broadly, there are two types of variables: categorical (nominal and ordinal) and numeric (interval and ratio) ( Table 1 ). Categorical data represent named groups of observations and are not quantitative. Categorical data can be ordered (ordinal) or not ordered (nominal). In our example below, represented by Table 2 , gender, race, payer, and disposition are examples of categorical nominal variables. In the below example, the age categories (<18 years, 18–34, 35–49, etc.) are examples of ordered categorical variables.

Table of Demographics

Description of the study population, comparing those hospitalization not for EGS and those for EGS. These data come from the 2017 Nationwide Inpatient Sample. Note that two cells are presented as “<” (less than); this is due to data restrictions of displaying cells less than 11.

EGS = emergency general surgery; SD = standard deviation; IQR = interquartile range; LOS = length of stay; SNF = skilled nursing facility; ICF = intermediate care facility.

Numerical data are collected as numbers. Length of stay is an example of numerical data. Length of stay is a continuous variable, meaning that it is a measure of length, represented by the unit “days” and usually rounded to the nearest integer. Length of stay is also an example of “ratio” data, whereby the numbers are meaningfully related and zero is an absolute number. In other words, a person who had a LOS of 6 days was in the hospital twice as long as a person in the hospital for 3 days, and no one has a negative LOS. This differs from interval data. Interval data are characterized by numbers that have equal distances between values but there is no fixed beginning. An example of this is time in a 12-hour clock. These distinctions are important because some numbers should not be added or subtracted, and only ratio data can be interpreted as multiples of each other. Some numeric data should not be treated as continuous, such as injury severity scale (ISS) because an ISS of 20 is not twice as bad as an ISS of 10. Furthermore, other seemingly numeric data do not even represent numbers, such as medical record number or zip code, which should be considered categorical data because the numbers are really only assigned labels.

Numerical data can be converted to categories if the researchers believe this conversion is appropriate. However, it is important to remember that converting data from continuous to categorical necessarily results in loss of information granularity. This may limit future analyses. Age is a continuous numerical variable that consists of ratio data. In Table 2 , age is described multiple ways. As continuous numerical data, age can be represented as a distribution with a mean and standard deviation, or a median and interquartile range. Alternatively, age was also converted into a categorical ordinal variable. We elected to present standard groups, namely, <18, 18–34, 35–49, 40–64, 65–79, 80+. These groups are not even intervals but are socially representative of groups that have similar attributes (child, young adult, etc.); another way to categorize age might be by deciles. Yet another way to group numerical data would be into those either above or below the median value for that parameter. Finally, numerical data may be grouped into categories to replicate findings from previous research, in which certain groupings were found to be meaningful. The researchers can decide which data presentation is most appropriate for their study and study question, and whether “cutting” numeric data into categories is useful or advantageous to demonstrate specific concepts being studied.

Data distribution and properties

When visualizing data, we are often seeking some conclusion regarding the distribution of the data, that is the shape of the data. Frequently, researchers try to determine if data follow a normal (or bell-shaped) distribution but often encounter data that is either left-skewed or right-skewed. Figure 1 demonstrates a normal distribution as well as distributions that are both left-skewed and right-skewed. The normal distribution is often desired because it allows for a number of powerful statistical tests to be conducted with the data, such as a Student t-test and linear regression, whereas skewed distributions violate important statistical assumptions of these tests. Another common distribution found in medical research is a bimodal distribution that as two peaks, which may occur, for example, if we saw the highest frequencies of a disease or condition in young adulthood and then again in older adulthood. Whereas the normal distribution is the most commonly discussed, it is actually found in only the minority of cases. It is important to note that there are numerous other statistical distributions with their own assumptions and analyses that are beyond the scope of this article but that researchers may encounter in the literature.

Mean, median, and mode are called measures of central tendency and are the simplest way to describe where the middle of numerical data distribution lies. The arithmetic mean is the average of all the numbers (the sum of numbers divided by the total count of items that were included in the sum). Technically, numeric scales such as Likert scales or injury severity scores that are not ratio data should not be presented as means. In a 10-point Likert scale, a value of eight is not twice as large as a level of four, nor is it four times as bad as a value of two, and thus a mean value cannot really be interpreted. A mean is most appropriate when a ratio continuous variable is normally distributed, or the values are shaped like a classic bell curve. Means can also be used more confidently when sample sizes are large and are therefore more likely to follow a normal distribution.

The median value is the middle number if all numerical values are lined up sequentially. A median and range is less affected to outliers than a mean and standard deviation, which makes the median a better choice for variables with a skewed distribution, a large number of outliers, or small sample size. Because no arithmetic is used to calculate them, median values are more interpretable for things such as scales or scores that cannot be added or subtracted. The mode is the value observed frequently. For a parameter that is distributed normally, the mean, median, and mode are all the same.

In addition to measurements of central tendency, the range, interquartile range, and standard deviation are useful properties. The range is displayed as the minimum and maximum value for the variable. Reviewing the minimum and maximum values can often help identify data entry errors, for example, an age of 510 years entered by mistake when the actual age was 51 years. The interquartile range represents the 25th percentile to the 75th percentile for the variable and is typically listed after the median. Mean values are typically displayed with a standard deviation, which indicates how wide the spread of numbers is around the average value.

Demographics table example

In the example demographics table ( Table 2 ), categorical variables such as gender, race, payer, admission type, and disposition are presented as n (%) and these are relatively straightforward. Important groupings here are dependent on the researcher's aims. For example, race groups or disposition can be combined or separated.

We present multiple ways to show numerical data. Looking first at age, there is a small difference between mean and median, where the mean age for EGS and non-EGS groups is slightly lower than the median age, suggesting that there are young outliers that skew the mean age with a leftward tail. Grouping by age categories may provide extra detail about age distribution, showing more than one-half of all EGS and non-EGS admissions occur in adults over the age of 40, whereas hospitalizations for EGS occurs in a lower proportion of pediatric patients.

Alternatively, the mean values for LOS as well as total charges are much larger than the median values, suggesting that there are outliers with long LOS that skew the data to have a long rightward tail. This is common for hospital and intensive care unit LOS data. For total charges, the standard deviations are larger than the value of the means, suggesting that there is a wide variation in charges and utilizing the mean for this variable is likely not the best approach for further analysis. Thus, without even seeing the actual data, the reader can make inferences about their shape based on the differences between mean and median calculations and also on the relative size of the standard deviation compared with the mean. Familiarity with the most common shapes of data such as age and LOS will also draw attention to unusual patterns and alert readers when the incorrect statistical test is being applied.

Data description and visualization using histograms

Although there are several statistical tests to assess for normality of a certain parameter, often the most obvious method is visual interpretation of a histogram. A histogram is a visual representation of the distribution of the data, where the frequency of a value is plotted on the y-axis, typically as bars, against the value of the variable on the x-axis. We present several histograms below, overlaying the normal distribution to highlight skewness. Of note, the y-axis here is not the frequency (the number of individuals in each bin) but rather the density. The density is a re-scaling of the frequency to accommodate a true normal distribution, where the area under the curve and the sum of the area of the bars equals one. The visual shape of the distribution will be identical with either frequency or density on the y-axis. Formal comparisons of these data are presented in a follow-up article [ 5 ]. Figure 2 highlights the distribution of age between non-EGS cases and EGS hospitalizations. As suggested by the demographics table, there is a large number of young non-EGS admissions, which leads to skewing of the age data; the histogram shows this more clearly than simply the presentation of the means and medians. Note also that the non-EGS age has a tri - modal distribution, with three peaks of frequency compared with only a single peak in the EGS group.

FIG. 2.

Distribution of age (in years) stratified by those hospitalizations that were not for emergency general surgery (EGS) and those that were for EGS.

Another commonly used figure is the boxplot, seen in the lower half of Figure 3 . This is another way to demonstrate the distribution of the data and is a very efficient method of communicating data. The middle bar represents the median, the edges of the box are the first and third quartiles, and the lines (commonly called whiskers) represent the data extending to 1.5 times the interquartile range. Points outside this are displayed and represent the most extreme outliers. They are another useful visualization, especially when presenting the distribution of a value across groups (e.g., LOS stratified by race). Figures 2 and 3 demonstrate the distribution, and particularly the skewness, of two of the continuous variables of interest: age ( Fig. 3 ) and LOS. In particular, LOS shows a skewed distribution and inflation of the mean but arriving at these conclusions can be much easier using well-developed data visualizations such as Figure 3 . In these figures we can clearly see the outliers in the boxplots, whereas the histograms confirm that the distributions do not follow a normal distribution (the black curve overlaid). Additionally, we would likely want to present the median and interquartile range when describing these variables because we know the mean and standard deviation are highly sensitive to these outliers. Although we present these figures in this article, in a study we would likely include them as a supplement for reviewers and fellow researchers to reference if needed.

FIG. 3.

Distribution, both histogram and boxplot, of the age (in years) of those hospitalizations for emergency general surgery (EGS). The y-axis of the histogram represents the density (not frequency), and the normal curve for these data is overlaid to highlight the skew in age data for this population.

Example of data description for a methods section of an article

Ideally, the methods section of an article will be comprehensive enough that would allow for your work to be reproduced. In addition to the overview, data source(s), study population, inclusion/exclusion criteria, and variables of interest (as we do in our own methods section), it is important to describe how data will be displayed. The portion of the methods that includes this information, from a hypothetical study, could be as follows: “Numerical data are expressed as median (interquartile range) and were assessed for normality using both the XXX test and visually using both histograms and boxplots. Categorical data are expressed as number (%). Because age was not distributed normally, and rather followed a bimodal distribution, this variable was converted to categorical and dichotomized around the median. Time to surgery was also not distributed normally and so converted into three categories: <24 hours, 24–72 hours, and >72 hours, based on our prior study (appropriate citation).”

The complete description of our data, as the first step of the analysis stage, is crucial to understanding the study population as well as informing our later statistical decisions. This process of describing the data can also serve as a mechanism for study validity and ensure that earlier parts of the study (e.g., data cleaning, processing, and management) did not introduce any errors. One example of this may be if we were studying a condition primarily prevalent in older adults but identified younger adults in the exploratory analysis. This would either suggest a data or coding error, which should be investigated thoroughly, or unique cases of the condition of study that may warrant exclusion.

This ability to spot errors also links to the ability to make additional study cohort restrictions to better refine the study population or remove heterogeneity. In our example of EGS, there are two key areas in our data exploration that could influence future analytic decisions: age and admission type. Of our EGS population, 8% of hospitalizations were children and 31% were 65 years old or older ( Table 1 ). In our study we would first, perhaps, exclude children from the analysis by considering potential heterogeneity or differences, in disease presentation and management across later age groups. If our study question was to examine only the geriatric population, we might restrict our analysis to the 31% that are 65 years old or older. Furthermore, although termed emergency general surgery, we identified that 16.2% of hospitalizations for EGS were labelled elective ( Table 1 ), which highlights a limitation of administrative data and use of diagnosis codes. For that reason, and in hopes of creating the most accurate case definition, we could consider restricting on both age and admission type, to focus on older adults who were non-elective admissions.

Once the study cohort has been identified and the initial descriptive statistics have been conducted, data visualization is an important next step. This visualization of the data, much like the description of the data, serves two important purposes: first it provides a way to convey important information about your study population and second it aids decisions for subsequent statistical analyses. In addition to these important principles to convey your data and findings, these visualizations can help assess the normality of variables that identifies skewness and informs the validity of statistical comparisons and regression models, discussed in more detail elsewhere. Lack of normality and distributions, would require us to utilize non-parametric analyses, which again are detailed in a follow-up article [ 5 ].

Another important consideration in the creation of a Demographics Table is whether or not to include p values. Historically, these tables have included p values as a way to identify statistically significant differences between the two groups efficiently, with a threshold of significance to be 0.05 (that is, only p values <0.05 are considered statistically significant). This statistical value was introduced to prominence by statistician Ronald Fisher in 1925 as a mechanism to assess the probability that the result obtained is as or more extreme than what was observed due to chance alone [ 6 , 7 ]. In recent years, however, there has been a shift away from the reliance on p values because of a myriad of factors, including the increasing emphasis on the threshold to determine significance or results, and the often misleading interpretation or reasoning surrounding these cut points [ 6–8 ]. One additional limitation of an arbitrary p value is that in large datasets such as the NIS, statistical significance is easily achieved even when differences between groups are small and likely not clinically or meaningfully significant. For these reasons, we have chosen not to display them and, instead, focus our description of the data on meaningful differences while leaving hypothesis testing to specific questions in comparing the data.

The final important point to raise in this article is our analysis of the unweighted data. The NIS, and many other federal and nationally representative datasets, includes weighting information, which makes it possible to create national estimates. We did present the national estimate for the number of hospitalizations, but the rest of our description was on the unweighted and thus cannot be taken as national estimates. One must think critically about the intention of the study and its goals when deciding on weighting, as weighting adds another layer of complexity to describing the data, conducting the analyses, and reporting the results. Primarily, weighting results in standard errors for each estimate and its proportion. This standard error helps capture the complex survey design elements but makes reporting the results much more challenging. As the point of this article was not to produce national estimates but to demonstrate statistical principles, we chose not to account for weight.

In conclusion, accurately describing data in tables and figure helps to make important decisions on study inclusion criteria, present and convey results to readers, and make decisions regarding which statistical approach is valid. Although the field has previously emphasized including p values in tables, recent advancements have de-emphasized this and, instead, descriptions of data should focus on meaningful differences not just those that may be statistically significant.

Funding Information

Dr. Ho is supported by the Case Western Reserve University Clinical and Translational Science Collaborative of Cleveland (KL2TR002547).

Author Disclosure Statement

Dr. Ho's spouse is a consultant for Zimmer Biomet, Sig Medical, Atricure, and Medtronic.

This publication was made possible by the Clinical and Translational Science Collaborative of Cleveland, KL2TR002547 from the National Center for Advancing Translational Sciences (NCATS) component of the National Institutes of Health and NIH roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

  • 1. Smith JW, Knight Davis J, Quatman-Yates CC, et al. . Loss of community-dwelling status among survivors of high-acuity emergency general surgery disease. J Am Geriatr Soc 2019;67:2289–2297. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 2. patchwork: The Composer of Plots [computer program]. Version R package version 1.0.12020. [ Google Scholar ]
  • 3. Wickham H, Averick M, Bryan J, et al. . Welcome to the tidyverse. J Open Source Softw 2019;4:1686. [ Google Scholar ]
  • 4. Héroux M. Verify if data are normally distributed in R: Part 1. Scientifically Sound. 2018. https://scientificallysound.org/2018/06/07/test-normal-distribution-r/ (Last accessed January 12, 2021).
  • 5. Bensken WP, Ho VP, Pieracci FM. Basic introduction to statistics in medicine, part 2: Comparing data. Surg Infect 2021;22:597–603. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 6. Greenland S, Senn SJ, Rothman KJ, et al. . Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 2016;31:337–350. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 7. Kennedy-Shaffer L. Before p <0.05 to beyond p <0.05: Using history to contextualize p-values and significance testing. Am Stat 2019;73(Suppl 1):82–90. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 8. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon statistical significance. Am Stat 2019;73(Suppl 1):235–245. [ Google Scholar ]
  • View on publisher site
  • PDF (225.9 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

IMAGES

  1. Methods for Presenting Statistical Data in an Easy to Read Way

    data presentation and statistical

  2. Data Presentation

    data presentation and statistical

  3. 25 Statistical Infographic Examples To Help Visualize Data

    data presentation and statistical

  4. Methods of data presentation in statistics PPT

    data presentation and statistical

  5. Graphical Presentation of Statistical Data PPT

    data presentation and statistical

  6. Statistics Data Charts Dashboard Infographics

    data presentation and statistical

VIDEO

  1. Computer: Aid in Statistical Computing and Data Presentation

  2. How to Interpret Statistical Data and Draw Conclusions from Charts and Graphs

  3. Data Presentation: Diagrams & Graphs || Statistical Methods For Economics

  4. Presentation of data ch 2 lec 1

  5. The Importance of Data Presentation

  6. Lecture 4 Fundamentals of Biostatistics