torch_geometric.loader
A data loader which merges data objects from a |
|
A data loader that performs neighbor sampling as introduced in the "Inductive Representation Learning on Large Graphs" paper. |
|
A link-based data loader derived as an extension of the node-based |
|
The Heterogeneous Graph Sampler from the "Heterogeneous Graph Transformer" paper. |
|
Clusters/partitions a graph data object into multiple subgraphs, as motivated by the "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" paper. |
|
The data loader scheme from the "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" paper which merges partioned subgraphs and their between-cluster links from a large-scale graph data object to form a mini-batch. |
|
The GraphSAINT sampler base class from the "GraphSAINT: Graph Sampling Based Inductive Learning Method" paper. |
|
The GraphSAINT node sampler class (see |
|
The GraphSAINT edge sampler class (see |
|
The GraphSAINT random walk sampler class (see |
|
The ShaDow \(k\)-hop sampler from the "Decoupling the Depth and Scope of Graph Neural Networks" paper. |
|
A data loader that randomly samples nodes within a graph and returns their induced subgraph. |
|
A data loader which batches data objects from a |
|
A data loader which batches data objects from a |
|
A data loader which merges succesive events of a |
|
The neighbor sampler from the "Inductive Representation Learning on Large Graphs" paper, which allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible. |
|
A weighted random sampler that randomly samples elements according to class distribution. |
- class DataLoader(dataset: Union[Dataset, List[BaseData]], batch_size: int = 1, shuffle: bool = False, follow_batch: Optional[List[str]] = None, exclude_keys: Optional[List[str]] = None, **kwargs)[source]
A data loader which merges data objects from a
torch_geometric.data.Datasetto a mini-batch. Data objects can be either of typeDataorHeteroData.- Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1)shuffle (bool, optional) – If set to
True, the data will be reshuffled at every epoch. (default:False)follow_batch (List[str], optional) – Creates assignment batch vectors for each key in the list. (default:
None)exclude_keys (List[str], optional) – Will exclude each key in the list. (default:
None)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader.
- class NeighborLoader(data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]], num_neighbors: Union[List[int], Dict[Tuple[str, str, str], List[int]]], input_nodes: Union[Tensor, None, str, Tuple[str, Optional[Tensor]]] = None, replace: bool = False, directed: bool = True, time_attr: Optional[str] = None, transform: Optional[Callable] = None, is_sorted: bool = False, filter_per_worker: bool = False, neighbor_sampler: Optional[NeighborSampler] = None, **kwargs)[source]
A data loader that performs neighbor sampling as introduced in the “Inductive Representation Learning on Large Graphs” paper. This loader allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.
More specifically,
num_neighborsdenotes how much neighbors are sampled for each node in each iteration.NeighborLoadertakes in this list ofnum_neighborsand iteratively samplesnum_neighbors[i]for each node involved in iterationi - 1.Sampled nodes are sorted based on the order in which they were sampled. In particular, the first
batch_sizenodes represent the set of original mini-batch nodes.from torch_geometric.datasets import Planetoid from torch_geometric.loader import NeighborLoader data = Planetoid(path, name='Cora')[0] loader = NeighborLoader( data, # Sample 30 neighbors for each node for 2 iterations num_neighbors=[30] * 2, # Use a batch size of 128 for sampling training nodes batch_size=128, input_nodes=data.train_mask, ) sampled_data = next(iter(loader)) print(sampled_data.batch_size) >>> 128
By default, the data loader will only include the edges that were originally sampled (
directed = True). This option should only be used in case the number of hops is equivalent to the number of GNN layers. In case the number of GNN layers is greater than the number of hops, consider settingdirected = False, which will include all edges between all sampled nodes (but is slightly slower as a result).Furthermore,
NeighborLoaderworks for both homogeneous graphs stored viaDataas well as heterogeneous graphs stored viaHeteroData. When operating in heterogeneous graphs, more fine-grained control over the amount of sampled neighbors of individual edge types is possible, but not necessary:from torch_geometric.datasets import OGB_MAG from torch_geometric.loader import NeighborLoader hetero_data = OGB_MAG(path)[0] loader = NeighborLoader( hetero_data, # Sample 30 neighbors for each node and edge type for 2 iterations num_neighbors={key: [30] * 2 for key in hetero_data.edge_types}, # Use a batch size of 128 for sampling training nodes of type paper batch_size=128, input_nodes=('paper', hetero_data['paper'].train_mask), ) sampled_hetero_data = next(iter(loader)) print(sampled_hetero_data['paper'].batch_size) >>> 128
Note
For an example of using
NeighborLoader, see examples/hetero/to_hetero_mag.py.The
NeighborLoaderwill return subgraphs where global node indices are mapped to local indices corresponding to this specific subgraph. However, often times it is desired to map the nodes of the current subgraph back to the global node indices. A simple trick to achieve this is to include this mapping as part of thedataobject:# Assign each node its global node index: data.n_id = torch.arange(data.num_nodes) loader = NeighborLoader(data, ...) sampled_data = next(iter(loader)) print(sampled_data.n_id)
- Parameters
data (torch_geometric.data.Data or torch_geometric.data.HeteroData) – The
DataorHeteroDatagraph object.num_neighbors (List[int] or Dict[Tuple[str, str, str], List[int]]) – The number of neighbors to sample for each node in each iteration. In heterogeneous graphs, may also take in a dictionary denoting the amount of neighbors to sample for each individual edge type. If an entry is set to
-1, all neighbors will be included.input_nodes (torch.Tensor or str or Tuple[str, torch.Tensor]) – The indices of nodes for which neighbors are sampled to create mini-batches. Needs to be either given as a
torch.LongTensorortorch.BoolTensor. If set toNone, all nodes will be considered. In heterogeneous graphs, needs to be passed as a tuple that holds the node type and node indices. (default:None)replace (bool, optional) – If set to
True, will sample with replacement. (default:False)directed (bool, optional) – If set to
False, will include all edges between all sampled nodes. (default:True)time_attr (str, optional) – The name of the attribute that denotes timestamps for the nodes in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. neighbors have an earlier timestamp than the center node. (default:
None)transform (Callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default:
None)is_sorted (bool, optional) – If set to
True, assumes thatedge_indexis sorted by column. This avoids internal re-sorting of the data and can improve runtime and memory efficiency. (default:False)filter_per_worker (bool, optional) – If set to
True, will filter the returning data in each worker’s subprocess rather than in the main process. Setting this toTrueis generally not recommended: (1) it may result in too many open file handles, (2) it may slown down data loading, (3) it requires operating on CPU tensors. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_size,shuffle,drop_lastornum_workers.
- class LinkNeighborLoader(data: Union[Data, HeteroData], num_neighbors: Union[List[int], Dict[Tuple[str, str, str], List[int]]], edge_label_index: Union[Tensor, None, Tuple[str, str, str], Tuple[Tuple[str, str, str], Optional[Tensor]]] = None, edge_label: Optional[Tensor] = None, replace: bool = False, directed: bool = True, neg_sampling_ratio: float = 0.0, time_attr: Optional[str] = None, transform: Optional[Callable] = None, is_sorted: bool = False, filter_per_worker: bool = False, neighbor_sampler: Optional[LinkNeighborSampler] = None, **kwargs)[source]
A link-based data loader derived as an extension of the node-based
torch_geometric.loader.NeighborLoader. This loader allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.More specifically, this loader first selects a sample of edges from the set of input edges
edge_label_index(which may or not be edges in the original graph) and then constructs a subgraph from all the nodes present in this list by samplingnum_neighborsneighbors in each iteration.from torch_geometric.datasets import Planetoid from torch_geometric.loader import LinkNeighborLoader data = Planetoid(path, name='Cora')[0] loader = LinkNeighborLoader( data, # Sample 30 neighbors for each node for 2 iterations num_neighbors=[30] * 2, # Use a batch size of 128 for sampling training nodes batch_size=128, edge_label_index=data.edge_index, ) sampled_data = next(iter(loader)) print(sampled_data) >>> Data(x=[1368, 1433], edge_index=[2, 3103], y=[1368], train_mask=[1368], val_mask=[1368], test_mask=[1368], edge_label_index=[2, 128])
It is additionally possible to provide edge labels for sampled edges, which are then added to the batch:
loader = LinkNeighborLoader( data, num_neighbors=[30] * 2, batch_size=128, edge_label_index=data.edge_index, edge_label=torch.ones(data.edge_index.size(1)) ) sampled_data = next(iter(loader)) print(sampled_data) >>> Data(x=[1368, 1433], edge_index=[2, 3103], y=[1368], train_mask=[1368], val_mask=[1368], test_mask=[1368], edge_label_index=[2, 128], edge_label=[128])
The rest of the functionality mirrors that of
NeighborLoader, including support for heterogenous graphs.Note
neg_sampling_ratiois currently implemented in an approximate way, i.e. negative edges may contain false negatives.time_attris currently implemented such that for an edge (src_node, dst_node), the neighbors of src_node can have a later timestamp than dst_node or vice-versa.- Parameters
data (torch_geometric.data.Data or torch_geometric.data.HeteroData) – The
DataorHeteroDatagraph object.num_neighbors (List[int] or Dict[Tuple[str, str, str], List[int]]) – The number of neighbors to sample for each node in each iteration. In heterogeneous graphs, may also take in a dictionary denoting the amount of neighbors to sample for each individual edge type. If an entry is set to
-1, all neighbors will be included.edge_label_index (Tensor or EdgeType or Tuple[EdgeType, Tensor]) – The edge indices for which neighbors are sampled to create mini-batches. If set to
None, all edges will be considered. In heterogeneous graphs, needs to be passed as a tuple that holds the edge type and corresponding edge indices. (default:None)edge_label (Tensor) – The labels of edge indices for which neighbors are sampled. Must be the same length as the
edge_label_index. If set toNonethen no labels are returned in the batch.replace (bool, optional) – If set to
True, will sample with replacement. (default:False)directed (bool, optional) – If set to
False, will include all edges between all sampled nodes. (default:True)neg_sampling_ratio (float, optional) – The ratio of sampled negative edges to the number of positive edges. If
edge_labeldoes not exist, it will be automatically created and represents a binary classification task (1= edge,0= no edge). Ifedge_labelexists, it has to be a categorical label from0tonum_classes - 1. After negative sampling, label0represents negative edges, and labels1tonum_classesrepresent the labels of positive edges. Note that returned labels are of typetorch.floatfor binary classification (to facilitate the ease-of-use ofF.binary_cross_entropy()) and of typetorch.longfor multi-class classification (to facilitate the ease-of-use ofF.cross_entropy()). (default:0.0).time_attr (str, optional) – The name of the attribute that denotes timestamps for the nodes in the graph. If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. neighbors have an earlier timestamp than the center node. (default:
None)transform (Callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default:
None)is_sorted (bool, optional) – If set to
True, assumes thatedge_indexis sorted by column. This avoids internal re-sorting of the data and can improve runtime and memory efficiency. (default:False)filter_per_worker (bool, optional) – If set to
True, will filter the returning data in each worker’s subprocess rather than in the main process. Setting this toTrueis generally not recommended: (1) it may result in too many open file handles, (2) it may slown down data loading, (3) it requires operating on CPU tensors. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_size,shuffle,drop_lastornum_workers.
- class HGTLoader(data: HeteroData, num_samples: Union[List[int], Dict[str, List[int]]], input_nodes: Union[str, Tuple[str, Optional[Tensor]]], transform: Optional[Callable] = None, filter_per_worker: bool = False, **kwargs)[source]
The Heterogeneous Graph Sampler from the “Heterogeneous Graph Transformer” paper. This loader allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.
HGTLoadertries to (1) keep a similar number of nodes and edges for each type and (2) keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.Methodically,
HGTLoaderkeeps track of a node budget for each node type, which is then used to determine the sampling probability of a node. In particular, the probability of sampling a node is determined by the number of connections to already sampled nodes and their node degrees. With this,HGTLoaderwill sample a fixed amount of neighbors for each node type in each iteration, as given by thenum_samplesargument.Sampled nodes are sorted based on the order in which they were sampled. In particular, the first
batch_sizenodes represent the set of original mini-batch nodes.Note
For an example of using
HGTLoader, see examples/hetero/to_hetero_mag.py.from torch_geometric.loader import HGTLoader from torch_geometric.datasets import OGB_MAG hetero_data = OGB_MAG(path)[0] loader = HGTLoader( hetero_data, # Sample 512 nodes per type and per iteration for 4 iterations num_samples={key: [512] * 4 for key in hetero_data.node_types}, # Use a batch size of 128 for sampling training nodes of type paper batch_size=128, input_nodes=('paper': hetero_data['paper'].train_mask), ) sampled_hetero_data = next(iter(loader)) print(sampled_data.batch_size) >>> 128
- Parameters
data (torch_geometric.data.HeteroData) – The
HeteroDatagraph data object.num_samples (List[int] or Dict[str, List[int]]) – The number of nodes to sample in each iteration and for each node type. If given as a list, will sample the same amount of nodes for each node type.
input_nodes (str or Tuple[str, torch.Tensor]) – The indices of nodes for which neighbors are sampled to create mini-batches. Needs to be passed as a tuple that holds the node type and corresponding node indices. Node indices need to be either given as a
torch.LongTensorortorch.BoolTensor. If node indices are set toNone, all nodes of this specific type will be considered.transform (Callable, optional) – A function/transform that takes in an a sampled mini-batch and returns a transformed version. (default:
None)filter_per_worker (bool, optional) – If set to
True, will filter the returning data in each worker’s subprocess rather than in the main process. Setting this toTrueis generally not recommended: (1) it may result in too many open file handles, (2) it may slown down data loading, (3) it requires operating on CPU tensors. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_size,shuffle,drop_lastornum_workers.
- class ClusterData(data, num_parts: int, recursive: bool = False, save_dir: Optional[str] = None, log: bool = True)[source]
Clusters/partitions a graph data object into multiple subgraphs, as motivated by the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper.
- Parameters
data (torch_geometric.data.Data) – The graph data object.
num_parts (int) – The number of partitions.
recursive (bool, optional) – If set to
True, will use multilevel recursive bisection instead of multilevel k-way partitioning. (default:False)save_dir (string, optional) – If set, will save the partitioned data to the
save_dirdirectory for faster re-use. (default:None)log (bool, optional) – If set to
False, will not log any progress. (default:True)
- class ClusterLoader(cluster_data, **kwargs)[source]
The data loader scheme from the “Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks” paper which merges partioned subgraphs and their between-cluster links from a large-scale graph data object to form a mini-batch.
Note
Use
ClusterDataandClusterLoaderin conjunction to form mini-batches of clusters. For an example of using Cluster-GCN, see examples/cluster_gcn_reddit.py or examples/cluster_gcn_ppi.py.- Parameters
cluster_data (torch_geometric.loader.ClusterData) – The already partioned data object.
**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_size,shuffle,drop_lastornum_workers.
- class GraphSAINTSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]
The GraphSAINT sampler base class from the “GraphSAINT: Graph Sampling Based Inductive Learning Method” paper. Given a graph in a
dataobject, this class samples nodes and constructs subgraphs that can be processed in a mini-batch fashion. Normalization coefficients for each mini-batch are given vianode_normandedge_normdata attributes.Note
See
GraphSAINTNodeSampler,GraphSAINTEdgeSamplerandGraphSAINTRandomWalkSamplerfor currently supported samplers. For an example of using GraphSAINT sampling, see examples/graph_saint.py.- Parameters
data (torch_geometric.data.Data) – The graph data object.
batch_size (int) – The approximate number of samples per batch.
num_steps (int, optional) – The number of iterations per epoch. (default:
1)sample_coverage (int) – How many samples per node should be used to compute normalization statistics. (default:
0)save_dir (string, optional) – If set, will save normalization statistics to the
save_dirdirectory for faster re-use. (default:None)log (bool, optional) – If set to
False, will not log any pre-processing progress. (default:True)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_sizeornum_workers.
- class GraphSAINTNodeSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]
The GraphSAINT node sampler class (see
GraphSAINTSampler).
- class GraphSAINTEdgeSampler(data, batch_size: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]
The GraphSAINT edge sampler class (see
GraphSAINTSampler).
- class GraphSAINTRandomWalkSampler(data, batch_size: int, walk_length: int, num_steps: int = 1, sample_coverage: int = 0, save_dir: Optional[str] = None, log: bool = True, **kwargs)[source]
The GraphSAINT random walk sampler class (see
GraphSAINTSampler).- Parameters
walk_length (int) – The length of each random walk.
- class ShaDowKHopSampler(data: Data, depth: int, num_neighbors: int, node_idx: Optional[Tensor] = None, replace: bool = False, **kwargs)[source]
The ShaDow \(k\)-hop sampler from the “Decoupling the Depth and Scope of Graph Neural Networks” paper. Given a graph in a
dataobject, the sampler will create shallow, localized subgraphs. A deep GNN on this local graph then smooths the informative local signals.Note
For an example of using
ShaDowKHopSampler, see examples/shadow.py.- Parameters
data (torch_geometric.data.Data) – The graph data object.
depth (int) – The depth/number of hops of the localized subgraph.
num_neighbors (int) – The number of neighbors to sample for each node in each hop.
node_idx (LongTensor or BoolTensor, optional) – The nodes that should be considered for creating mini-batches. If set to
None, all nodes will be considered.replace (bool, optional) – If set to
True, will sample neighbors with replacement. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_sizeornum_workers.
- class RandomNodeSampler(data, num_parts: int, shuffle: bool = False, **kwargs)[source]
A data loader that randomly samples nodes within a graph and returns their induced subgraph.
Note
For an example of using
RandomNodeSampler, see examples/ogbn_proteins_deepgcn.py.- Parameters
data (torch_geometric.data.Data) – The graph data object.
num_parts (int) – The number of partitions.
shuffle (bool, optional) – If set to
True, the data is reshuffled at every epoch (default:False).**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asnum_workers.
- class DataListLoader(dataset: Union[Dataset, List[BaseData]], batch_size: int = 1, shuffle: bool = False, **kwargs)[source]
A data loader which batches data objects from a
torch_geometric.data.datasetto a Python list. Data objects can be either of typeDataorHeteroData.Note
This data loader should be used for multi-GPU support via
torch_geometric.nn.DataParallel.- Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1)shuffle (bool, optional) – If set to
True, the data will be reshuffled at every epoch. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asdrop_lastornum_workers.
- class DenseDataLoader(dataset: Union[Dataset, List[Data]], batch_size: int = 1, shuffle: bool = False, **kwargs)[source]
A data loader which batches data objects from a
torch_geometric.data.datasetto atorch_geometric.data.Batchobject by stacking all attributes in a new dimension.Note
To make use of this data loader, all graph attributes in the dataset need to have the same shape. In particular, this data loader should only be used when working with dense adjacency matrices.
- Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1)shuffle (bool, optional) – If set to
True, the data will be reshuffled at every epoch. (default:False)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asdrop_lastornum_workers.
- class TemporalDataLoader(data: TemporalData, batch_size: int = 1, **kwargs)[source]
A data loader which merges succesive events of a
torch_geometric.data.TemporalDatato a mini-batch.- Parameters
data (TemporalData) – The
TemporalDatafrom which to load the data.batch_size (int, optional) – How many samples per batch to load. (default:
1)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader.
- class NeighborSampler(edge_index: Union[Tensor, SparseTensor], sizes: List[int], node_idx: Optional[Tensor] = None, num_nodes: Optional[int] = None, return_e_id: bool = True, transform: Optional[Callable] = None, **kwargs)[source]
The neighbor sampler from the “Inductive Representation Learning on Large Graphs” paper, which allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.
Given a GNN with \(L\) layers and a specific mini-batch of nodes
node_idxfor which we want to compute embeddings, this module iteratively samples neighbors and constructs bipartite graphs that simulate the actual computation flow of GNNs.More specifically,
sizesdenotes how much neighbors we want to sample for each node in each layer. This module then takes in thesesizesand iteratively samplessizes[l]for each node involved in layerl. In the next layer, sampling is repeated for the union of nodes that were already encountered. The actual computation graphs are then returned in reverse-mode, meaning that we pass messages from a larger set of nodes to a smaller one, until we reach the nodes for which we originally wanted to compute embeddings.Hence, an item returned by
NeighborSamplerholds the currentbatch_size, the IDsn_idof all nodes involved in the computation, and a list of bipartite graph objects via the tuple(edge_index, e_id, size), whereedge_indexrepresents the bipartite edges between source and target nodes,e_iddenotes the IDs of original edges in the full graph, andsizeholds the shape of the bipartite graph. For each bipartite graph, target nodes are also included at the beginning of the list of source nodes so that one can easily apply skip-connections or add self-loops.Warning
NeighborSampleris deprecated and will be removed in a future release. Usetorch_geometric.loader.NeighborLoaderinstead.Note
For an example of using
NeighborSampler, see examples/reddit.py or examples/ogbn_products_sage.py.- Parameters
edge_index (Tensor or SparseTensor) – A
torch.LongTensoror atorch_sparse.SparseTensorthat defines the underlying graph connectivity/message passing flow.edge_indexholds the indices of a (sparse) symmetric adjacency matrix. Ifedge_indexis of typetorch.LongTensor, its shape must be defined as[2, num_edges], where messages from nodesedge_index[0]are sent to nodes inedge_index[1](in caseflow="source_to_target"). Ifedge_indexis of typetorch_sparse.SparseTensor, its sparse indices(row, col)should relate torow = edge_index[1]andcol = edge_index[0]. The major difference between both formats is that we need to input the transposed sparse adjacency matrix.sizes ([int]) – The number of neighbors to sample for each node in each layer. If set to
sizes[l] = -1, all neighbors are included in layerl.node_idx (LongTensor, optional) – The nodes that should be considered for creating mini-batches. If set to
None, all nodes will be considered.num_nodes (int, optional) – The number of nodes in the graph. (default:
None)return_e_id (bool, optional) – If set to
False, will not return original edge indices of sampled edges. This is only useful in case when operating on graphs without edge features to save memory. (default:True)transform (callable, optional) – A function/transform that takes in a sampled mini-batch and returns a transformed version. (default:
None)**kwargs (optional) – Additional arguments of
torch.utils.data.DataLoader, such asbatch_size,shuffle,drop_lastornum_workers.
- class ImbalancedSampler(dataset: Union[Data, Dataset, List[Data]], input_nodes: Optional[Tensor] = None, num_samples: Optional[int] = None)[source]
A weighted random sampler that randomly samples elements according to class distribution. As such, it will either remove samples from the majority class (under-sampling) or add more examples from the minority class (over-sampling).
Graph-level sampling:
from torch_geometric.loader import DataLoader, ImbalancedSampler sampler = ImbalancedSampler(dataset) loader = DataLoader(dataset, batch_size=64, sampler=sampler, ...)
Node-level sampling:
from torch_geometric.loader import NeighborLoader, ImbalancedSampler sampler = ImbalancedSampler(data, input_nodes=data.train_mask) loader = NeighborLoader(data, input_nodes=data.train_mask, batch_size=64, num_neighbors=[-1, -1], sampler=sampler, ...)
- Parameters
dataset (Dataset or Data) – The dataset from which to sample the data, either given as a
DatasetorDataobject.input_nodes (Tensor, optional) – The indices of nodes that are used by the corresponding loader, e.g., by
NeighborLoader. If set toNone, all nodes will be considered. This argument should only be set for node-level loaders and does not have any effect when operating on a set of graphs as given byDataset. (default:None)num_samples (int, optional) – The number of samples to draw for a single epoch. If set to
None, will sample as much elements as there exists in the underlying data. (default:None)