+
Skip to content

CCICB/ggEDA

Repository files navigation

ggEDA ggEDA website

CRAN version R-CMD-check lifecycle: experimental Codecov branch coverage Issues Code size Last commit r-universe

ggEDA streamlines exploratory data analysis by providing turnkey approaches to visualising n-dimensional data which can graphically reveal correlative or associative relationships between two or more features:

  • ggstack: tiled one-dimensional visualisations that more effectively show missingness and complex categorical relationships in smaller datasets.
  • ggparallel: parallel coordinate plots (PCPs) for examining large datasets with mostly quantitative features.

To create ggEDA visualisations through a shiny app see interactiveEDA

Installation

install.packages("ggEDA")

Development Version

You can install the development version of ggEDA from GitHub with:

if (!require("remotes"))
    install.packages("remotes")

remotes::install_github("CCICB/ggEDA")

Or from R-universe with:

install.packages("ggEDA", repos = "https://ropensci.r-universe.dev")

Quick Start

For examples of interactive EDA plots see the ggEDA gallery

# Load library
library(ggEDA)

# Plot data, sort by Glasses
ggstack(
  baseballfans,
  col_id = "ID",
  col_sort = "Glasses",
  interactive = FALSE,
  verbose = FALSE,
  options = ggstack_options(legend_nrow = 2)
)

Customise Colours

Customise colours by supplying a named list to the palettes argument

ggstack(
  baseballfans,
  col_id = "ID",
  col_sort = "Glasses",
  palettes = list("EyeColour" = c(
    Brown = "rosybrown4",
    Blue = "steelblue",
    Green = "seagreen"
  )),
  interactive = FALSE,
  verbose = FALSE,
  options = ggstack_options(legend_nrow = 2)
)

A note on missing and infinite values

Infinite values in numeric colums are indicated with directional (↓ & ↑) arrows to differentiate them from missing (NA) values which are represented by !.

data <- data.frame(
  numbers = c(1:3, Inf, -Inf, NA), 
  letters = LETTERS[1:6]
)

ggstack(data, interactive = FALSE, verbose = FALSE)

If rendering numeric columns as heatmaps, infinite values are clamped to the min/max colours, while na values remain grey. We can optionally add markers by setting show_na_marker_heatmap = TRUE

ggstack(
  data, 
  interactive = FALSE, 
  verbose = FALSE,
  options = ggstack_options(numeric_plot_type = "heatmap", show_na_marker_heatmap = TRUE)
)

Parallel Coordinate Plots

For datasets with many observations and mostly numeric features, parallel coordinate plots may be more appropriate.

ggparallel(
 data = minibeans,
 col_colour = "Class",
 order_columns_by = "auto",
 interactive = FALSE
)
#> ℹ Ordering columns based on mutual information with [Class]

 ggparallel(
   data = minibeans,
   col_colour = "Class",
   highlight = "DERMASON",
   order_columns_by = "auto",
   interactive = FALSE
 )
#> ℹ Ordering columns based on how well they differentiate 1 group from the rest [DERMASON] (based on mutual information)

 ggparallel(
   data = minibeans,
   order_columns_by = "auto",
   interactive = FALSE
 )
#> ℹ To add colour to plot set `col_colour` to one of: Class
#> ℹ Ordering columns to minimise crossings
#> ℹ Choosing axis order via repetitive nearest neighbour with two-opt refinement

Community Contributions

All types of contributions are encouraged and valued. See our guide to community contributions for different ways to help.

About

Easily produce visualisations for exploratory data analysis in R

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载