Releases: pyg-team/pyg-lib
pyg-lib 0.5.0
We are excited to announce the release of pyg-lib
0.5 🎉🎉🎉
What's Changed
- Increment subgraph id globally for all seed nodes by @kgajdamo in #304
- Fix tests on macOS ARM by @rusty1s in #305
- fix bug to add large tensor support by @kaixuanliu in #308
- Add macOS M1 support by @rusty1s in #310
- Fix macOS nightly build by @rusty1s in #312
- Fix macOS install (part2) by @rusty1s in #313
- Build for Windows by @rusty1s in #315
- Update Windows build by @rusty1s in #317
- Add PyTorch 2.3 support by @rusty1s in #322
- Load
libpyg.so
first to lettorch.library.register_fake
find custom operators by @akihironitta in #329 - Ensure consistent line endings in the repository by @akihironitta in #332
- Add PyTorch 2.4 support by @rusty1s in #338
- Add PyTorch 2.4 support by @rusty1s in #339
- Drop Windows/PyTorch 2.0.0/
cu121
builds by @rusty1s in #340 - Add
sphinx-lint
by @rusty1s in #341 - Support for
sphinx_copybutton
by @rusty1s in #342 - Autodocument type hints by @rusty1s in #343
- Fix docs build in CI by @akihironitta in #351
- Drop Python 3.8 support by @akihironitta in #356
- Add PyTorch 2.5 support by @rusty1s in #360
- Fix typo in
README
by @rusty1s in #362 - Remove unused
enumerate
infused_scatter_reduce
by @akihironitta in #370 - Remove usage of
torch::autograd::Variable
by @akihironitta in #369 - Add
cuCollections
dependency by @rusty1s in #371 - Limit concurrency for nightly jobs by @akihironitta in #372
- Boilerplate for custom
HashMap
implementation by @rusty1s in #373 - Basic implementation of
CPUHashMap
by @rusty1s in #375 - Support various data types in
CPUHashMap
by @rusty1s in #376 - Add asserts and default get to
HashMap
by @rusty1s in #377 - Pybind support for
CPUHashMap
by @rusty1s in #378 CPUHashMap
benchmark by @rusty1s in #379- Use multi-threading and parallel hash map in
CPUHashMap
by @rusty1s in #380 - Implement dynamic polymorphism in
CPUHashMap
by @rusty1s in #381 - Restructure file layout for
HashMap
- prepare CUDA version by @rusty1s in #382 - Fix documentation builds by @rusty1s in #384
- Update labeling procedure by @rusty1s in #385
- Re-structure includes by @rusty1s in #388
- Fix CI by @rusty1s in #390
- Add
CUDAHashMapImpl
viacuCollections
by @rusty1s in #391 - Robustify
CUDAHashMap
implementation + add serialization by @rusty1s in #392 - Align
CPUHashMap
andCUDAHashMap
implementations by @rusty1s in #393 - [Bug] Fixing bug in
fused_scatter_reduce
function. by @drivanov in #394 - Introduce
load_factor
- usedtype.int
by default inHashMap
benchmark by @rusty1s in #396 - Debug Windows build by @rusty1s in #397
- Disable Windows support in
CUDAHashMap
by @rusty1s in #398 - Support
num_submaps
inCPUHashMap
by @rusty1s in #399 - Remove unnecessary
argsort
inCUDAHashMap.keys()
by @rusty1s in #400 - Add tests for
HashMap
python bindings by @rusty1s in #401 - Update benchmark script to respect
num_submaps
by @rusty1s in #402 - Expose
size
,dtype
anddevice
inHashMap
by @rusty1s in #403 - CI: Auto-merge PRs from bots by @akihironitta in #405
- Replace
c10::optional
with equivalentstd::optional
by @akihironitta in #406 NeighborSampler
boilerplate by @rusty1s in #413- Revert: "Replace
c10::optional
with equivalentstd::optional
by @rusty1s in #416 - Implement a basic hetero sampler in the class by @vid-koci in #415
- Add
MetapathTracker
helper class toNeighborSampler
by @vid-koci in #417 - Oversample metapaths where number of samples was lower than expected by @vid-koci in #419
- PyTorch 2.6 support by @rusty1s in #421
- Fix
SetDevice
by @rusty1s in #422 - Correctly set CUDA device by @rusty1s in #423
- Add
pyproject.toml
by @rusty1s in #426 - Move stylers/linters to
pyproject.toml
by @rusty1s in #428 - Python 3.13 support by @rusty1s in #429
- Replace random shuffle with a sort in
MetapathAwareNeighborSampler
by @vid-koci in #425 - Bump CI's ubuntu to latest by @Kh4L in #432
- Add CXX11 ABI build support by @Kh4L in #431
- Only trigger automerge workflow on opening a PR by @akihironitta in #434
- Add
NO_METIS
flag by @rusty1s in #436 - Include
Dispatch.h
by @rusty1s in #437 - Fix the NO_METIS env var by @vid-koci in #438
- Add a header for the Neighbor sampler class by @vid-koci in #439
- Additional 0 neighbor checks and a test by @vid-koci in #440
- Fix CUDA architectures to build for in CI by @akihironitta in #443
- ci: Tiny clean up by @akihironitta in #444
- Avoid deprecated format in
project.license
field inpyproject.toml
by @akihironitta in #446 - Revert "Avoid deprecated format in
project.license
field inpyproject.toml
(#446)" by @akihironitta in #449 - Fix typo in README.md by @akihironitta in #450
- Support PyTorch 2.7 and CUDA 12.8 by @akihironitta in #442
- ci: Set up dependabot by @akihironitta in #451
- Update
README.md
on PyTorch 2.7 and CUDA 12.8 support by @akihironitta in #456 - Remove PyTorch 1.12 handling from CI by @akihironitta in #459
- Build for all Python versions on Linux by @rusty1s in #465
- Fix Windows build by @rusty1s in #466
- ci: Cancel running jobs when a new commit is pushed to PR by @akihironitta in #469
- ci: Prepare for decoupling build configs from workflows by @akihironitta in #468
- ci: Add reusable workflow building Linux wheels by @akihironitta in #470
- Fix labeler by @akihironitta in #472
- Block commits to local master by @akihironitta in #473
- ci: Remove unnecessary
install.yml
by @akihironitta in #474 - ci: Remove unnecessary
python_testing.yml
by @akihironitta in #475 - ci: Shrink config matrix to run on PRs by @akihironitta in #476
- ci: Add reusable workflow building macOS wheels by @akihironitta in #471
- ci: Add reusable workflow building Windows wheels by @akihironitta in #477
- Trim macOS and Windows build matrix by @akihironitta in #482
- Skip test cases on CPU Windows due to PyTorch 2.4.0 bug by @akihironitta in #483
- Remove
Python.h
by @akihironitta in #462 - Fix automerge workflow by @akihironitta in https://github.com/pyg-t...
pyg-lib 0.4.0: PyTorch 2.2 support, distributed sampling, sparse softmax, edge-level temporal sampling
pyg-lib==0.4.0
brings PyTorch 2.2 support, distributed neighbor sampling, accelerated softmax operations, and edge-level temporal sampling support to PyG 🎉🎉🎉
Highlights
PyTorch 2.2 Support
pyg-lib==0.4.0
is fully compatible with PyTorch 2.2 (#294). To install for PyTorch 2.2, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.2.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu118
or cu121
The following combinations are supported:
PyTorch 2.2 | cpu |
cu118 |
cu121 |
---|---|---|---|
Linux | ✅ | ✅ | ✅ |
macOS | ✅ |
Older PyTorch versions like PyTorch 1.12, 1.13, 2.0.0 and 2.1.0 are still supported, and can be installed as described in our README.md
.
Distributed Sampling
pyg-lib==0.4.0
integrates all the low-level code for performing distributed neighbor sampling as part of torch_geometric.distributed
in PyG 2.5 (#246, #252, #253, #254).
Sparse Softmax Implementation
pyg-lib==0.4.0
supports a fast sparse softmax_csr
implementation based on CSR input representation (#264, #282):
from pyg_lib.ops import softmax_csr
src = torch.randn(4, 4)
ptr = torch.tensor([0, 4])
out = softmax_csr(src, ptr)
Edge-level Temporal Sampling
pyg-lib==0.4.0
brings edge-level temporal sampling support to PyG (#280). In particular, neighbor_sample
and hetero_neighbor_sample
now support the edge_time
attribute, which will only samples edges in case they have a lower or equal timestamp than their corresponding seed_time
.
Additional Features
- Added support for
bfloat16
data type insegment_matmul
andgrouped_matmul
on CPU (#272) - Improved the runtime of biased sampling in
neighbor_sample
andhetero_neighbor_sample
(#270)
Bugfixes
- Dropped the MKL code path in
neighbor_sample
andhetero_neighbor_sample
withreplace=False
since it did not correctly prevent duplicates (#275) - Fixed
grouped_matmul
in case input tensors are not contiguous (#290)
New Contributors
Full Changelog: 0.3.0...0.4.0
pyg-lib 0.3.1: Bugfixes
pyg-lib==0.3.1
includes a variety of bugfixes and improvements.
Bug Fixes
- Fixed an issue introduced in
pyg-lib==0.3.0
in which thereplace=False
option was not correctly respected duringneighbor_sample
(#275) - Fixed support for older
GLIBC
versions (#276)
Improvements
- Biased
neighbor_sample
has been made approximately twice as fast (#270) segment_matmul
andgrouped_matmul
now supportbfloat16
CPU tensors (#271)
Full Changelog: 0.3.0...0.3.1
pyg-lib 0.3.0: PyTorch 2.1 support, METIS partitioning, neighbor sampler improvements
pyg-lib==0.3.0
brings PyTorch 2.1 support, METIS partioning and further neighbor sampling improvements to PyG 🎉🎉🎉
Highlights
PyTorch 2.1 Support
pyg-lib==0.3.0
is fully compatible with PyTorch 2.1 (#256). To install for PyTorch 2.1, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu118
or cu121
The following combinations are supported:
PyTorch 2.1 | cpu |
cu118 |
cu121 |
---|---|---|---|
Linux | ✅ | ✅ | ✅ |
macOS | ✅ |
Older PyTorch versions like PyTorch 1.12, 1.13 and 2.0.0 are still supported, and can be installed as described in our README.md
. PyTorch 1.11 support has been dropped.
METIS partioning
pyg-lib==0.3.0
enables METIS partioning by introducing pyg_lib.partition
(#229).
from pyg_lib.partition import metis
cluster = metis(rowptr, col, num_partitions)
Neighbor Sampling Improvements
pyg-lib==0.3.0
brings various improvements to our neighbor sampling routine:
- Support for biased/weighted sampling:
pyg_lib.sampler.neighbor_sample
andpyg_lib.sampler.hetero_neighbor_sample
now support the additionaledge_weight
argument (#247, #251) pyg_lib.sampler.hetero_neighbor_sample
now performs neighborhood sampling across edge types in parallel (#211)- Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)
Additional Features
- Added dispatch for XPU device in
index_sort
(#243) - Updated
cutlass
version for speed boosts insegment_matmul
andgrouped_matmul
(#235)
Bugfixes
- Fixed vector-based mapping issue in
Mapping
(#244) - Fixed performance issues reported by Coverity Tool (#240)
- Fixed TorchScript support in
grouped_matmul
(#220)
New Contributors
- @yaox12 made their first contribution in #213
- @yanbing-j made their first contribution in #231
- @akihironitta made their first contribution in #248
Full Changelog: 0.2.0...0.3.0
pyg-lib 0.2.0: PyTorch 2.0 support, sampled operations, and further accelerations
pyg-lib==0.2.0
brings PyTorch 2.0 support, sampled operations and further accelerations to PyG 🎉🎉🎉
Highlights
PyTorch 2.0 Support
pyg-lib==0.2.0
is fully compatible with PyTorch 2.0. To install for PyTorch 2.0, simply run
pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu117
or cu118
The following combinations are supported:
PyTorch 2.0 | cpu |
cu117 |
cu118 |
---|---|---|---|
Linux | ✅ | ✅ | ✅ |
macOS | ✅ |
Older PyTorch versions like PyTorch 1.11, 1.12 and 1.13 are still supported, and can be installed as described in our README.md
.
Sampled Operations
We added support for sampled_op
implementations (#156, #159, #160), which implements the scheme
out = left_tensor[left_index] (op) right_tensor[right_index]
efficiently without materializing intermediate representations:
from pyg_lib.ops import sampled_add
edge_index = ...
row, col = edge_index
# Replace ...
out = x[row] + x[col]
# ... with
out = sampled_add(left=x, right=x, left_index=row, right_index=col)
Supported operations are sampled_add
, sampled_sub
, sampled_mul
and sampled_div
.
Further Accelerations
index_sort
implements a (way) faster alternative to sorting one-dimensional indices compared totorch.sort()
(#181, #192). This heavily increases dataset loading times in PyG:
- Optimized
segment_matmul
andgrouped_matmul
CPU implementations via MKL BLASgemm_batch
(#146, #172):
Breaking Changes
- Temporal
neighbor_sample
andhetero_neighbor_sample
will now sample nodes with the same or smaller timestamp than the seed node (changed from only sampling nodes with a smaller timestamp) (#187)
Full Changelog
Added
- Added PyTorch 2.0 support (#214)
neighbor_sample
routines now also return information about the number of sampled nodes/edges per layer (#197)- Added
index_sort
implementation (#181, #192) - Added
triton>=2.0
support (#171) - Added
bias
term togrouped_matmul
andsegment_matmul
(#161) - Added
sampled_op
implementation (#156, #159, #160)
Changed
- Sample the nodes with the same timestamp as seed nodes (#187)
- Added
write-csv
(saves benchmark results as csv file) andlibraries
(determines which libraries will be used in benchmark) parameters (#167) - Enable benchmarking of neighbor sampler on temporal graphs (#165)
- Improved
[segment|grouped]_matmul
CPU implementation viaat::matmul_out
and MKL BLASgemm_batch
(#146, #172)
Full commit list: 0.1.0...0.2.0
pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration
We are proud to release pyg-lib==0.1.0
, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG 🎉🎉🎉
Extensive documentation is provided here. Once pyg-lib
is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.
Installation
You can install pyg-lib
as described in our README.md
:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
where
${TORCH}
should be replaced by either1.11.0
,1.12.0
or1.13.0
${CUDA}
should be replaced by eithercpu
,cu102
,cu113
,cu115
,cu116
orcu117
The following combinations are supported:
PyTorch 1.13 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | ✅ | ✅ | ✅ | |||
Windows | ||||||
macOS | ✅ |
PyTorch 1.12 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | ✅ | ✅ | ✅ | ✅ | ||
Windows | ||||||
macOS | ✅ |
PyTorch 1.11 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | ✅ | ✅ | ✅ | ✅ | ||
Windows | ||||||
macOS | ✅ |
Highlights
pyg_lib.sampler
: Optimized homogeneous and heterogeneous neighborhood sampling
pyg-lib
provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:
pyg_lib.sampler.neighbor_sample(
rowptr: Tensor,
col: Tensor,
seed: Tensor,
num_neighbors: List[int],
time: Optional[Tensor] = None,
seed_time: Optional[Tensor] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
and
pyg_lib.sampler.hetero_neighbor_sample(
rowptr_dict: Dict[EdgeType, Tensor],
col_dict: Dict[EdgeType, Tensor],
seed_dict: Dict[NodeType, Tensor],
num_neighbors_dict: Dict[EdgeType, List[int]],
time_dict: Optional[Dict[NodeType, Tensor]] = None,
seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
pyg_lib.sampler.neighbor_sample
and pyg_lib.sampler.hetero_neighbor_sample
recursively sample neighbors from all node indices in seed
in the graph given by (rowptr, col)
. Also supports temporal sampling via the time
argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time
.
pyg_lib.ops
: Heterogeneous GNN acceleration
pyg-lib
provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:
segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor
pyg_lib.ops.segment_matmul
performs dense-dense matrix multiplication according to segments along the first dimension of inputs
as given by ptr
.
inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)
out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]
Full Changelog
Added
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul
(#137) - Added
fused_scatter_reduce
operation for multiple reductions (#141, #142) - Added
triton
dependency (#133, #134) - Enable
pytest
testing (#132) - Added C++-based autograd and TorchScript support for
segment_matmul
(#120, #122) - Allow overriding
time
for seed nodes viaseed_time
inneighbor_sample
(#118) - Added
[segment|grouped]_matmul
CPU implementation (#111) - Added
temporal_strategy
option toneighbor_sample
(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapper
benchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sample
andpyg::sampler::hetero_neighbor_sample
(#95, #96) - Speed up
pyg::sampler::neighbor_sample
viaIndexTracker
implementation (#84) - Added
pyg::sampler::hetero_neighbor_sample
implementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vector
implementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmul
andsegment_matmul
CUDA implementations viacutlass
(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sample
implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapper
utility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased sampling
utils (#38) - Added
CHANGELOG.md
(#39) - Added
pyg.subgraph()
(#31) - Added nightly builds ([#28](https://github.com...