Checking Weather Classification Data
Re-vectorize data from each night pedestal dataframes in fd_ped_h5 data

Checking Weather Classification Data

Looking at the data saved in fd_ped_h5 vs. fd_ped_vect vs. fd_ped_vect_nonpadded to see how far the discrepancies go from yesterday.

y2016m07d15s1

fd_ped_h5

>>> store = pd.HDFStore('y2016m07d15s1_ped_fluct.h5')
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: y2016m07d15s1_ped_fluct.h5

>>> store.info
<bound method HDFStore.info of <class 'pandas.io.pytables.HDFStore'>
File path: y2016m07d15s1_ped_fluct.h5
>
>>> store.info()
u"<class 'pandas.io.pytables.HDFStore'>\nFile path: y2016m07d15s1_ped_fluct.h5\nEmpty"

Looks empty. Try reprocessing? This is a LR night so wait for later.

y2007m11d07 (First Training Night)

Night Animation

As the animation created from the dataframe below shows, there is 4-30 parts.

First Frame :

Last Frame :

fd_ped_h5

Load HDF5 file with Pandas

>>> store = pd.HDFStore('y2007m11d07s0_ped_fluct.h5')
>>> store.info()
u"<class 'pandas.io.pytables.HDFStore'>\nFile path: y2007m11d07s0_ped_fluct.h5\n/frame_info_df                frame        (shape->[401,7])   \n/ped_fluct_df                 frame        (shape->[401,3072])\n/ped_fluct_norm_df            frame        (shape->[401,3072])"

Load frame_info_df :

>>> df =  store['frame_info_df']

>>> df.head()
   frame_max              ...                                 frame_time
0      120.0              ...              2007-11-07 02:45:38.182826622
1      311.0              ...              2007-11-07 02:46:38.182826622
2      402.0              ...              2007-11-07 02:47:38.182826622
3      260.0              ...              2007-11-07 02:48:38.182826622
4       84.0              ...              2007-11-07 02:49:38.182826622

[5 rows x 7 columns]

>>> df.columns
Index([u'frame_max', u'frame_mean', u'frame_min', u'frame_minute',
       u'frame_part', u'frame_sigma', u'frame_time'],
      dtype='object')

Load ped_fluct_df :

>>> df =  store['ped_fluct_df']
>>> df.columns
Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071],
           dtype='int64', length=3072)

>>> df.head()
   0     1     2     3     4     5     6     ...   3065  3066  3067  3068  3069  3070  3071
0     3     1     4     8     2     8     2  ...      1     1     1     2     1     1     1
1     2     1     3     4     2     6     2  ...      3     2     2     1     2     1     1
2     1     1     3     1     4     1     2  ...      1     1     1     1     1     1     1
3     4     1     2     3     5     4     1  ...      1     1     1     1     1     1     1
4     1     2     1     5     6     3     7  ...      1     1     1     1     1     1     1

Load ped_fluct_norm_df :

>>> df =  store['ped_fluct_norm_df']
>>> df.columns
Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            3062, 3063, 3064, 3065, 3066, 3067, 3068, 3069, 3070, 3071],
           dtype='int64', length=3072)
>>> df.head()
       0         1         2         3       ...         3068      3069      3070      3071
0  0.479712  0.324762  0.520288  0.618051    ...     0.422525  0.324762  0.324762  0.324762
1  0.372493  0.286307  0.422909  0.458680    ...     0.286307  0.372493  0.286307  0.286307
2  0.277452  0.277452  0.409830  0.277452    ...     0.277452  0.277452  0.277452  0.277452
3  0.469128  0.292828  0.380978  0.432543    ...     0.292828  0.292828  0.292828  0.292828
4  0.341965  0.444906  0.341965  0.580988    ...     0.341965  0.341965  0.341965  0.341965

[5 rows x 3072 columns]

fd_ped_vect_nonpadded

-rw-r--r-- 1 gfurlich gfurlich     128 Dec  4 10:37 y2007m11d07s0_ped_fluct_vectorized.npy

Looks empty… Any Training data is empty… hmm… Not good

-rw-r--r-- 1 gfurlich gfurlich     128 Dec  4 13:04 y2017m09d13s0_ped_fluct_vectorized.npy
-rw-r--r-- 1 gfurlich gfurlich  245888 Dec  4 13:04 y2017m09d14s0_ped_fluct_vectorized.npy
-rw-r--r-- 1 gfurlich gfurlich     128 Dec  4 13:04 y2017m09d16s0_ped_fluct_vectorized.npy

Looking at it with Numpy

>>> import numpy as np
>>> np.load('y2007m11d07s0_ped_fluct_vectorized.npy')
array([], shape=(0, 32, 96), dtype=float64)

Looks empty. Look at another

fd_ped_vect

y2017m11d28s0 non training data

fd_ped_vect_nonpadded

File :

-rw-r--r-- 1 gfurlich gfurlich 1106048 Dec  4 13:08 y2017m11d28s0_ped_fluct_vectorized.npy

>>>X = np.load('y2017m11d28s0_ped_fluct_vectorized.npy')
>>> X.shape
(45, 32, 96)
>>> X[-1]
array([[0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.34257391,
        0.34257391],
       [0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.34257391,
        0.34257391],
       [0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.44569893,
        0.34257391],
       ...,
       [0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.34257391,
        0.34257391],
       [0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.34257391,
        0.34257391],
       [0.34257391, 0.34257391, 0.34257391, ..., 0.34257391, 0.34257391,
        0.44569893]])

Quick plot :

>>> plt.imshow(X[-1], cmap='inferno', vmin=.1, vmax=X[-1].max())
<matplotlib.image.AxesImage object at 0x7f17fc2e4710>
>>> plt.show()

Looks right. Check shape against master db.

Master fd ped part database

Files :

[2019-03-15 10:21:59] $ ls -lrt *.h5
-rw-r--r-- 1 gfurlich gfurlich  3274744 Oct  1 10:58 fd_pedestal_nights_db.h5
-rw-r--r-- 1 gfurlich gfurlich   150787 Oct  1 10:58 pmt_positions.h5
-rw-r--r-- 1 gfurlich gfurlich 16793758 Oct  1 10:58 master_fd_ped_db_by_part.h5
-rw-r--r-- 1 gfurlich gfurlich  2370620 Oct 10 15:09 fd_dark_time_db.h5

[2019-03-15 10:22:13] $ du -sh master_fd_ped_db_by_part.h5
17M	master_fd_ped_db_by_part.h5

Load master_fd_ped_db_by_part.h5

>>> import pandas as pd
>>> store = pd.HDFStore('master_fd_ped_db_by_part.h5')
>>> store.info()
u"<class 'pandas.io.pytables.HDFStore'>\nFile path: master_fd_ped_db_by_part.h5\n/master_br_fd_ped_db_by_part            frame        (shape->[19759,6])\n/master_lr_fd_ped_db_by_part            frame        (shape->[16493,6])"
>>> df = store['master_br_fd_ped_db_by_part']
>>> df.columns
Index([u'part', u'part_duration', u'part_start', u'part_stop',
       u'part_weather_status', u'run_night'],
      dtype='object')

Find the last night y2017m11d28 and print info :

>>> df[df['run_night'] == pd.to_datetime('2017-11-28').date()]
   part   part_duration    ...     part_weather_status  run_night
   5 00:22:21.008656    ...                       0 2017-11-28
   6 00:43:02.009053    ...                       0 2017-11-28
  10 00:25:47.215286    ...                       0 2017-11-28
  11 00:37:24.218904    ...                       0 2017-11-28
  15 00:24:41.500328    ...                       0 2017-11-28
  20 00:23:18.214296    ...                       0 2017-11-28
  21 00:08:32.148072    ...                       0 2017-11-28
  16 00:45:02.840683    ...                       0 2017-11-28

>>> df[df['run_night'] == pd.to_datetime('2017-11-28').date()].to_string()
part   part_duration                    part_start                     part_stop  part_weather_status  run_night
   5 00:22:21.008656 2017-11-28 08:45:34.818238216 2017-11-28 09:07:55.826895139                    0 2017-11-28
   6 00:43:02.009053 2017-11-28 09:08:23.289856964 2017-11-28 09:51:25.298910500                    0 2017-11-28
  10 00:25:47.215286 2017-11-28 09:53:48.009802586 2017-11-28 10:19:35.225089529                    0 2017-11-28
  11 00:37:24.218904 2017-11-28 10:20:02.569907793 2017-11-28 10:57:26.788812717                    0 2017-11-28
  15 00:24:41.500328 2017-11-28 10:59:49.881877531 2017-11-28 11:24:31.382205713                    0 2017-11-28
  20 00:23:18.214296 2017-11-28 12:12:24.851270744 2017-11-28 12:35:43.065567492                    0 2017-11-28
  21 00:08:32.148072 2017-11-28 12:36:10.301215743 2017-11-28 12:44:42.449288199                    0 2017-11-28
  16 00:45:02.840683 2017-11-28 11:24:58.684055964 2017-11-28 12:10:01.524739755                    0 2017-11-28

Realize I don’t have frame numbers for each part but I can infer from the minutes in a part in ceil(part_duration) - 1 since I take the minute diff.

Looking at other dataframe for part info

>>> store2 = pd.HDFStore('fd_pedestal_nights_db.h5')
>>> t = store2.info()
>>> print(t)
<class 'pandas.io.pytables.HDFStore'>
File path: fd_pedestal_nights_db.h5
/br_df            frame        (shape->[1808,2])
/lr_df            frame        (shape->[1640,2])

>>> df2 = store2['br_df']
>>> df2.columns
Index([u'ped_status', u'run_night'], dtype='object')

Nothing important there…

Max Night :

>>> df['part_duration'].max()
Timedelta('0 days 03:36:42.021044')

3 x 60 + 36 = 216 as max frames in a part. Matches what I had earlier for padding vectorized data.

Re-vectorize data from each night pedestal dataframes in fd_ped_h5 data

Re-transfer data back to CHPC

Deleted wrong aggregated vectorized data arrays on kingspeak under Vectorized_Data.

transferred the data over:

[2019-03-15 11:31:00] $ rsync -av fd_ped_h5 kingspeak:~/weat_ml/Data/

fd_ped_h5/y2018m08d12s1_ped_fluct.h5

sent 16,619,429,753 bytes  received 65,437 bytes  98,632,018.93 bytes/sec
total size is 16,615,108,644  speedup is 1.00

$ du -sh *
16G	fd_ped_h5

transfer looks good.

Edit fd_pedestal_data_vectorization.py

renamed fd_pedestal_rnn_data_vectorization_v_chpc.py to fd_pedestal_data_vectorization.py.

[x] Update to save each part into it’s own numpy array [ ] make sure the frames look correct [ ] make sure each is correctly padded [x] ~update master datebase to include pedestal frame info?~ Print Frame length to out file of processing on CHPC

Transferred new vect script to kingspeak.

Proof that past vectorization was wrong

Looking back at the old log file fd_pedestal_rnn_vectorization.out on my CHPC account.

Saving Vectorized and Padded FD Pedestal Data as Numpy Arrays to /scratch/local/u0949991/Data/fd_ped_vect/y2017m11d28s0_ped_fluct_vectorized_padded.npy...
Saving Vectorized and Padded FD Pedestal Data as Numpy Arrays to /scratch/local/u0949991/Data/fd_ped_vect/y2017m11d28s0_ped_fluct_vectorized_padded.npy...
Saving Vectorized and Padded FD Pedestal Data as Numpy Arrays to /scratch/local/u0949991/Data/fd_ped_vect/y2017m11d28s0_ped_fluct_vectorized_padded.npy...
Saving Vectorized and Padded FD Pedestal Data as Numpy Arrays to /scratch/local/u0949991/Data/fd_ped_vect/y2017m11d28s0_ped_fluct_vectorized_padded.npy...
Saving Vectorized and Padded FD Pedestal Data as Numpy Arrays to /scratch/local/u0949991/Data/fd_ped_vect/y2017m11d28s0_ped_fluct_vectorized_padded.npy...

Parts in the same night were not being saved to different file.

Submit to CHPC and vectorize data

$ sbatch fd_pedestal_data_vectorization.slm
Submitted batch job 6856344

$ squeue -u u0949991
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           6856344 kingspeak vect_rnn u0949991 PD       0:00      1 (Priority)

TAResearch 2019-03-15

Categories : Weather, Machine Learning, Pandas, Numpy,

Checking Weather Classification Data

y2016m07d15s1

fd_ped_h5

y2007m11d07 (First Training Night)

Night Animation

fd_ped_h5

fd_ped_vect_nonpadded

fd_ped_vect

y2017m11d28s0 non training data

fd_ped_vect_nonpadded

Master fd ped part database

Max Night :

Re-vectorize data from each night pedestal dataframes in fd_ped_h5 data

Re-transfer data back to CHPC

Edit fd_pedestal_data_vectorization.py

Proof that past vectorization was wrong

Submit to CHPC and vectorize data