We'll discuss the two significant categories of record data: document data and transaction data. This video includes descriptions and examples to help you grasp the concepts in a better way.
What You'll Learn
> Record data and its examples
> Transaction data and its examples
> The differences between record and transaction data
Another useful subcategory of record data is document data. In this case, it is somewhat similar to a data matrix. Every term, every entry, every data attribute has a numeric value but in this case, we’ve got counts, we’ve got discrete values. So, what we have here is that each row, each data object is represented by what we think of as a term vector.
This term vector - and there're several ways you can do it - but in this case, it just counts the number of times a given word appears in the document. So, in document 1 "team" appears three times, "play" appears five times, but "coach" does not appear. In document 2, on the other hand, "coach" appears seven times, but never has "play" appeared over the course of the document. So because these attributes are all discrete and because they’re all integer attributes, we can do different kinds of things - different kinds of algorithms and processing methods are more appropriate than data matrices or mixed data is.
All right, so the last special kind of record data that we’re going to talk about here is transaction data. This has some similarities to the document data and you can use some of the same analysis but there’s different semantics around it as well. So, transaction data is exactly what it sounds like: it’s record data where each record involves a set of items. So if we’re at a grocery store, the set of products purchased by a customer during one shopping trip constitutes a transaction and the individual products that were purchased are the items. The difference between this and document data is that usually, these items have more information than just a count associated with them. So, not only is it bread, there’s a price associated with that, there’s maybe an inventory stock associated with that, how many are left, all of those sorts of things. So, we can do things similar to document analysis, but there're other sorts of information we have to consider as well. That’s transaction data.