AC Daily Log Part 3 : Pre-process 2018 Data

Andrew Chien
3 min readMay 13, 2020

--

This article is going to talk about how we clean the 2018 daily log data. The 2018 daily log format can be cut into two parts, which are daily log from January to March and the other is from April to December. First of all, will start with the 2018 daily log from January to March. As the following image shown below, it’s clear the format is close to daily log in 2020 compared with 2017 one’s. However, there are still some issues here. The first one is that film and read content are actually mixed in the same cell. Secondly like the 2017 daily log, the date is not unique and it’s in descending order. Here will just focus on the problem of the second one and the first one will be discussed in another session.

2018 daily log format

In order to make date unique in the data frame, python dictionary is created to store date information with the other columns’ data. As the following code, the first and second for loop is to create the date dictionary and the third for loop is to aggregate data with the same date and then put them into the date dictionary. The last for loop in the code is to put the date dictionary data into the data frame we created in the beginning.

Daily log in 2018 from March to May

After making the data frame of 2018 daily log from March to May, then valueless data should be removed. Therefore, will adopt the following code.

df_20180103=df_20180103[df_20180103['Life/Work']!=""]

Then next is to target on the rest days of the 2018 daily log which is simpler to handle. As the following code, the whole values is got from Google sheet first and then store them into the data frame created initially.

Daily log in 2018 from April to December

The last but not least, data need to be merged together. The following code shows how data from 2017 to 2018 is merged via the Python pandas package concat.

Merge data from 2017 to 2018

Afterward, we can have the following result. However, as we can see from the “Film” column, that the data inside actually includes the books I read and the video I watched indicates the data is not completely clean. This part will be further discussed in the daily log analysis part.

The result of the merge of daily log data from 2017 to 2018

Thank you for spending time reading all this, if you have any feedback, don’t hesitate to leave the comments below. And next time will talk about the data merge between daily log 2017 to 2018 and 2019 and also how data is updated to Google sheet.

People often say that motivation doesn’t last. Well, neither does bathing — that’s why we recommend it daily.

Zig Ziglar

Photo by Laurenz Kleinheider on Unsplash

--

--

No responses yet