Typo Dataset (Parallel Format)
Typo Dataset (CSV with Metadata)
This resource contains two different datasets with three levels of error generation for each. The two datasets are Amazon Fine Food Review and Large Movie Review datasets. The original datasets are cleaned for typographical errors before introducing artificial errors.
Here is the structure of the resource
All the files are in parallel format.
The same dataset is also available in CSV format. The CSV dataset contains the similar structure, with parallel files being the columns with same name in a CSV file. It also contains metadata like word count and out-of-vocabulary words.
Links to original datasets: Amazon Fine Food Review, Large Movie Review Dataset