Following the article AC Daily Log, this article we’re going to talk more about how 2017 daily log data is cleaned. First of all, the following image shows the daily log format in 2020 and 2017. As we can see, the format of 2017 is quite different from 2020. First difference is the way to present date that daily log in 2017 doesn’t have the unique date row. Secondly, life and work, film and read are separate fields in 2020 daily log, but 2017 daily log only has work and private including life, film and read. And in terms of number data like exercise or meditation time, 2017 daily log take habit this field as the column to store these information but with string and unit information inside.
In order to make the data merge process easier, data field of work, private and habit in 2017 daily log will be merged together as one for life/work field in 2020 daily log. First of all, we need to get the data from Google sheet, if you don’t know how to access it, please check here. And once data is got, we can have the format like the following image.
After getting the data, then we have to put data in the data frame we created following the format of 2020 daily log. In preparation for putting data into the data frame, we have to create a dictionary to store the data following the order of date as date row is not unique in 2017 daily log.
Once the dictionary for 2017 daily log is ready, then we can input the data into the data frame we created earlier. Before running the data input loop, we have to create a list called dic_key2017 to store the date data so as to input the data into data frame.
However, some rows need to be removed since the date starting to writing the daily log is from 12th March. As a result, the following code need to be implemented:
Once the redundant rows are removed, then we can get the complete 2017 daily log. Next time, will talk about how to organize 2018 daily log and merge 2018 daily log with 2017's.