1. timestamp
1.1 create timestamp
- custom timestamp
- Syntax: pd.Timestamp(ts_input,tz,year,month,day,hour,minute,second,microsecond,nanosecond,tzinfo)
- Code example:
import pandas as pd import pytz # When ts_input is a string, it is generally used with the tz parameter timestamp = pd.Timestamp(ts_input="2023-01-05", tz=pytz.timezone("Asia/Shanghai")) print(timestamp) # 2023-01-05 00:00:00+08:00
import pandas as pd # When ts_input is a numeric value, it is generally used with the unit parameter timestamp = pd.Timestamp(ts_input=1672909342.246457, unit="s") print(timestamp) # 2023-01-05 09:02:22.246457100
import pandas as pd # When ts_input is not passed, it is generally necessary to specify parameters such as year,month,day,hour,minute,second, etc. import pandas as pd timestamp = pd.Timestamp(year=2023,month=1,day=5,hour=17,minute=8,second=34) print(timestamp) # 2023-01-05 17:08:34
- Get the current timestamp
print(pd.Timestamp.now()) # 2023-01-05 17:48:56.629418 print(pd.Timestamp.utcnow()) # 2023-01-05 09:48:56.629418+00:00
1.2 Common methods and properties of timestamp
1.2.1 common methods of timestamp
- ts.tz_localize(tz)
Function: Localize the timestamp in naive time zone to other time zones
Parameters: tz: time zone identifier
ts = pd.Timestamp("2022-01-06") print(ts.tz) # None ts = ts.tz_localize("Asia/Shanghai") # Localized to Beijing time print(ts) # 2022-01-06 00:00:00+08:00 print(ts.value) # 1641398400000000000, nanosecond timestamp
1.2.2 common attributes of timestamp
- ts.value (view nanosecond integer timestamp)
ts = pd.Timestamp("2022-01-06") print(ts.value) # 1641398400000000000, nanosecond timestamp
1.3 Time zone and time zone conversion
1.3.1 Time zone
Time zone information in python can be viewed in the third-party library pytz
(1) Check the time zone
The two attributes all_timezones and common_timezones can be used in the pytz package to see which time zones are available.
import pytz print(len(pytz.all_timezones)) # 595 print(pytz.all_timezones[:5]) # ['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara']
import pytz print(len(pytz.common_timezones)) # 437 print(pytz.common_timezones[:5]) # ['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara']
(2) Get the time zone object
In the pytz package, you can use the pytz.timezone(zone) method to obtain the time zone object, zone is the time zone identifier, such as the time zone identifier of Shanghai, China is "Asia/Shanghai"
import pytz tz = pytz.timezone('Asia/Shanghai') tz # <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>
1.3.2 Time zone conversion
(1) utc time zone to other time zones (two ways)
- timestamp.astimezone(tz=None) -> Timestamp
- code example
import pandas as pd utc_ts = pd.Timestamp("2022-01-05 11:45:14",tz="utc") print(utc_ts) # 2022-01-05 19:45:14+00:00 beijing_ts = utc_ts.astimezone(tz="Asia/Shanghai") print(beijing_ts) # 2022-01-05 19:45:14+08:00
- timestamp.tz_convert(tz=None) -> Timestamp
- code example
import pandas as pd utc_ts = pd.Timestamp("2022-01-05 11:45:14",tz="utc") print(utc_ts) # 2022-01-05 19:45:14+00:00 beijing_ts = utc_ts.tz_convert(tz="Asia/Shanghai") print(beijing_ts) # 2022-01-05 19:45:14+08:00
(2) Convert other time zones to utc time zone (support all time zone conversions at the same time)
- pd.DataFrame.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise') -> Series | DataFraem
- Parameter introduction:
tz: string or pytz.timezone object
axis: positioning axis
level: If the axis is a MultiIndex, target a specific level. Otherwise must be None
copy: Copy the underlying data at the same time
ambiguous: May produce ambiguous times when the clock moves backwards due to DST
nonexistent: A non-existent time does not exist in a specific time zone where clocks are moved forward due to DST - code example
Simulate a set of time series data, and note that the time in the data is considered to be Beijing time. Our goal is to convert this time into utc time and generate a timestamp.
import pandas as pd import numpy as np grade = np.random.uniform(52,100,200).astype(np.int64) exam_dates = pd.date_range("2023-01-01", periods=200, freq="H") # Beijing time data = pd.DataFrame(data={"grade":grade}) data["date"] = exam_dates data.set_index("date",inplace=True)
output:
One thing to pay special attention to is: there are two types of time series in pandas (essentially Timestamp objects) in terms of time zones. The first is the time series in the naive time zone, that is, there is no time zone, and the default time series is this type. The other is the time-zone aware type, that is, the time series of time zone awareness. This time series (time stamp) object stores a nanosecond-level UTC timestamp, and its value does not change during the time zone conversion process. of. Use the ts.tz method to view the time zone of the time series, and use ts.value to view the nanosecond timestamp corresponding to the time series:
print(data.index.tz) # None, no time zone by default
Therefore, if we want to transfer this time series to another time zone, we must first determine which time zone it is in. Suppose we think that this time series is Beijing time, then we must first give the time series a time zone information, that is, localize the time series to the Beijing time zone. You can use the ts.localize(tz="Asia/Shanghai") method.
data_bj = data.localize("Asia/Shanghai") print(data_bj.index.tz) # Asia/Shanghai print(data_bj)
output:
The time series now has timezone information so we can convert it to another timezone using the ts.tz_convert(tz="utc") method.
data_utc = data_bj.tz_convert(tz="utc") print(data_utc.index.tz) data_utc
output:
In this way, Beijing time is successfully converted to utc time. But from the above results, we can see that the timestamp we converted is of time zone-aware type, with the words '+00:00'. To remove this word, you need to convert time zone-aware to naive type.
data_utc_naive = data_utc.tz_convert(None) data_utc_naive
output:
If we need to further convert the date into a numeric timestamp, it can be achieved in the following two ways:
① Through the timestamp definition, calculate the starting point "1970-01-01" by subtracting the timestamp from the current time
data_utc_naive["dtime1"] = (data_utc_naive.index - pd.Timestamp("1970-01-01")) // pd.Timedelta('1ms') # utc time to millisecond timestamp data_utc_naive
output:
② The values of Series have a view function view(dtype), we can use this method to view the numerical form of the Timestamp object
# Since the timestamp converted by the view function is in nanoseconds, we need to convert it to the precision we need by ourselves. # The time conversion relationship below the second level is as follows: 1s=1000ms=1000us=1000ns data_utc_naive["dtime2"] = data_utc_naive.index.values.view(dtype=np.int64) // 1000_000 data_utc_naive
output: