Transaction Data & Document Data


r_subheading-Course Description-r_end We'll discuss the two significant categories of record data: document data and transaction data. This video includes descriptions and examples to help you grasp the concepts in a better way. r_break r_break r_subheading-What You'll Learn-r_end • Record data and its examples. r_break • Transaction data and its examples. r_break • The differences between record and transaction data.

-

Another useful subcategory of record data is document data. In this case, it is somewhat similar to a data matrix. Every term, every entry, every data attribute has a numeric value but in this case, we’ve got counts, we’ve got discrete values. So, what we have here is that each row, each data object is represented by what we think of as a term vector. r_break r_break This term vector - and there're several ways you can do it - but in this case, it just counts the number of times a given word appears in the document. So, in document 1 ‘team’ appears three times, ‘play’ appears five times, but ‘coach’ does not appear. In document 2, on the other hand, 'coach' appears seven times, but never has ‘play’ appeared over the course of the document. So because these attributes are all discrete and because they’re all integer attributes, we can do different kinds of things - different kinds of algorithms and processing methods are more appropriate than data matrices or mixed data is. r_break r_break All right, so the last special kind of record data that we’re going to talk about here is transaction data. This has some similarities to the document data and you can use some of the same analysis but there’s different semantics around it as well. So, transaction data is exactly what it sounds like: it’s record data where each record involves a set of items. r_break r_break So if we’re at a grocery store, the set of products purchased by a customer during one shopping trip constitutes a transaction and the individual products that were purchased are the items. The difference between this and document data is that usually, these items have more information than just a count associated with them. So, not only is it bread, there’s a price associated with that, there’s maybe an inventory stock associated with that, how many are left, all of those sorts of things. r_break r_break So, we can do things similar to document analysis, but there're other sorts of information we have to consider as well. That’s transaction data.

Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.