A method for extracting information from a dataset, e.g., a document, includes: receiving the dataset at an information handling device, optionally, extracting, via optical character recognition implemented by a processor of the information handling device, textual information associated with the dataset, and classifying the dataset into one of a plurality of classes. Classifying the dataset may include computing a similarity score for each of the plurality of classes for each of a plurality of window regions of the dataset, calculating a subset of highest similarity scores for each of the plurality of classes for each of the plurality of window regions, determining overall similarity scores for each of the plurality of classes, and classifying the dataset as corresponding to a class with a highest overall similarity score.
Latest GRAIL, LLC Patents:
- Methylation markers and targeted methylation probe panel
- SYSTEMS AND METHODS FOR PERFORMING ADDITIVE SMOOTHING ON LOW-COVERAGE SEQUENCING DATA FROM A NUCLEIC ACID SAMPLE
- Convolutional neural network systems and methods for data classification
- Methylation-based false positive duplicate marking reduction
- Methods and systems for analyzing nucleic acid sequences using mappability analysis and de novo sequence assembly