TAResearch 2019-03-14
Categories : CHPC, Weather, Machine Learning,
Happy Pi Day!!
CNN on the CHPC
Check Loaded Concatenated Array of RCNN data (Looks wrong)
I still think the last frame stored in X is not correct or reshaped incorrectly in weather_class_train_model_v_chpc_cnn.py
. Looking more into this by printing outputs.
Looking at loaded data training_data.shape
= (3962, 216, 32, 96)
Play around with loading the data on GF-Ultra
du -sh $(locate weather_training_data_v_rnn.npy)
20G /GDF/TAResearch/CHPC/weat_ml/Vectorized_Data/weather_training_data_v_rnn.npy
20G /GDF/TAResearch/FD_Ped_Weather/weather_ml/weather_training_data_v_rnn.npy
Tried to load it but I ran out of memory. Look into getting an interactive node via slurm on the chpc. Video of slurm interactive node.
Info from UofU CHPC Website
srun --time=1:00:00 --ntasks 2 --nodes=1 --account=chpc --partition=ember --pty /bin/tcsh -l
or
srun -t 1:00:00 -n 2 -N 1 -A chpc -p ember --pty /bin/tcsh -l
As from my .slm
scripts the accounts I want to use is:
#SBATCH --account=bergman
#SBATCH --partition=kingspeak
Running on CHPC interactively.
srun -t 1:00:00 -n 1 -N 1 -A bergman -p kingspeak --pty /bin/bash -l
srun: job 6854300 queued and waiting for resources
srun: job 6854300 has been allocated resources
[u0949991@kp004 weat_ml]$ python
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> X = np.load('Vectorized_Data/weather_training_data_v_rnn.npy')
Looking at the Loaded Data
>>> X[0][0]
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
>>> X[-1][-1]
array([[ 0.42105824, 0.32363453, 0.47804745, ..., 0.57547115,
0.57547115, 0.32363453],
[ 0.32363453, 0.47804745, 0.32363453, ..., 0.59713744,
0.61590564, 0.68414507],
[ 0.51848194, 0.32363453, 0.42105824, ..., 0.69456115,
0.7218503 , 0.70425828],
...,
[ 0.42105824, 0.32363453, 0.32363453, ..., 0.32363453,
0.32363453, 0.32363453],
[ 0.42105824, 0.32363453, 0.42105824, ..., 0.32363453,
0.32363453, 0.32363453],
[ 0.32363453, 0.32363453, 0.32363453, ..., 0.47804745,
0.32363453, 0.32363453]])
>>> X[1][-1]
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
>>> X[2][-1]
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
>>> X[3][-1]
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
And these are all duplicates…
>>> X[13][-1]
array([[ 0.49096629, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
...,
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049]])
>>> X[14][-1]
array([[ 0.49096629, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
...,
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049]])
>>> X[15][-1]
array([[ 0.49096629, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
...,
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049],
[ 0.33238049, 0.33238049, 0.33238049, ..., 0.33238049,
0.33238049, 0.33238049]])
Checking if the arrays are the same…
>>> np.array_equal(X[-1][-1],X[-2][-1])
True
>>> np.array_equal(X[-1][-1],X[-3][-1])
True
>>> np.array_equal(X[-1][-1],X[-4][-1])
True
Looking for original vectorized data
So something is wrong with the training data… Maybe it was not concatenated correctly.. Look back at the vectorized, padded data and vectorized non-padded data. Date is in
/GDF/TAResearch/FD_Ped_Weather/Data/fd_ped_vect_nonpadded/
Size
[2019-03-14 16:42:36] gfurlich@GF-ULTRA:/GDF/TAResearch/FD_Ped_Weather/Data$ du -sh fd_ped_
fd_ped_h5/ fd_ped_vect/ fd_ped_vect_nonpadded/
[2019-03-14 16:42:36] gfurlich@GF-ULTRA:/GDF/TAResearch/FD_Ped_Weather/Data$ du -sh fd_ped_vect_nonpadded/
1.2G fd_ped_vect_nonpadded/
Date Created
[2019-03-14 16:45:56] gfurlich@GF-ULTRA:/GDF/TAResearch/FD_Ped_Weather/Data$ ls -lrt fd_ped_vect_nonpadded/
...
-rw-r--r-- 1 gfurlich gfurlich 1106048 Dec 4 13:08 y2017m11d28s0_ped_fluct_vectorized.npy
Amount of data
[2019-03-14 16:45:56] gfurlich@GF-ULTRA:/GDF/TAResearch/FD_Ped_Weather/Data$ ls -lrt fd_ped_vect_nonpadded/ | wc -l
1787
this represents the 1787 nights of data for BR. This it not just the training data.
** I realize that the data save from vectorizing is being save by night and not just part.** Thus much of the data is being overwriten by the last part in the night… so that is why it is wrong. I need to go back to the raw data in ``/GDF/TAResearch/FD_Ped_Weather/Data/fd_ped_h5/`
$ ls -lrt fd_ped_h5
...
-rw-r--r-- 1 gfurlich gfurlich 7375770 Sep 28 11:47 y2009m03d24s0_ped_fluct.h5
-rw-r--r-- 1 gfurlich gfurlich 7545368 Sep 28 11:47 y2009m03d25s0_ped_fluct.h5
-rw-r--r-- 1 gfurlich gfurlich 4878054 Sep 28 11:47 y2009m04d18s0_ped_fluct.h5
-rw-r--r-- 1 gfurlich gfurlich 11960654 Oct 16 16:41 y2017m02d25s0_ped_fluct.h5
-rw-r--r-- 1 gfurlich gfurlich 1032 Oct 24 10:25 y2016m07d15s1_ped_fluct.h5
-rw-r--r-- 1 gfurlich gfurlich 1032 Oct 24 10:35 y2015m11d01s1_ped_fluct.h5
Why is there data from Oct 24? Looks empty… Will look at it… Since the pedestal data has all parts saved in one .dst
file, I saved all parts of a night in one .h5
file. Now I need to save each part from a night into its own padded array…