my Q-learning optimization is a little bit weird,

Question

찬목 el 25 de Ag. de 2025

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2179581-my-q-learning-optimization-is-a-little-bit-weird

Editada: Cris LaPierre el 25 de Ag. de 2025

Cris LaPierre ha marcado con alerta este/a pregunta

Mostrar alertas

Abrir en MATLAB Online

Hi this is my optimization solution using Q-learning to optimize ESS chargning and discharging.

Objection is to reduce cost of total use, to reduce cost I try to charge ESS during the cheapest time and discharge during the most expensive time.

Action is charging and discharging rate and it will be updated to SOC(state of charge), there are some constraints that ESS SOC can not be lower than constraints.

every thing is good but i think updating process is something wrong because after discharging or charging ESS. SOC is not properly updated.

Can yon help me to deal with this problem?

clc; clear;

%% ===== 환경 파라미터 =====

T = 48;

eff_cha = 0.95;

eff_dch = 0.95;

SOC_min = 0.1;

SOC_max = 1;

SOC0 = 0.5;

P_ess_max = 1500; % ESS 최대 충/방전 전력 (kW)

ESS_cap = 3000; % ESS 용량 (kWh)

actions = linspace(-0.5,0.5,41); % 비율 [-0.5~0.5]

numActions = length(actions);

%% ===== 학습 파라미터 =====

alpha = 0.1;

gamma = 0.99; % 여기선 MC 방식이라 직접 사용 안 함

epsilon = 0.5;

epsilon_min = 0.05;

epsilon_decay = 0.995;

numEpisodes = 60000;

%% ===== 상태 공간 (이산화) =====

numSOCs = 101;

numPrices=3

numPrices = 3

Q = zeros(numSOCs, numPrices, T, numActions);

%% ===== 가격/부하 데이터 =====

price_real = 140.5*ones(1,24);

price_real(1:7) = 87.3;

price_real(22:24) = 87.3;

price_real(8:10) = 109.8;

price_real(12) = 109.8;

price_real(18:21) = 109.8;

price_real = [price_real, price_real]; % 48시간

load_real = table2array(readtable('48_consumption_6.1.xlsx'));

pv_real = table2array(readtable("PV_gen.xlsx"));

load_real = load_real - pv_real; % 순부하(kW)

%% ===== SOC 임계값 =====

p_crt_val = 0.03;

for p =1:24

p_crt(p) = p_crt_val*(24-p);

end

p_crt = [p_crt, p_crt] + 0.03*randn(1,48);

%% ===== 상태 이산화용 정규화 =====

price_norm = price_real / max(price_real);

discretizeState = @(x) min(max(floor(x * numPrices) + 1, 1), numPrices);

%% ===== Monte Carlo 학습 루프 =====

saving_history = NaN(1,numEpisodes); % 완주한 에피소드 절감액만 기록

completion_rate = zeros(1,numEpisodes); % 완주율 기록

for ep = 1:numEpisodes

SOC = SOC0; % 초기 SOC 비율

episode_memory = []; % [SOC_idx, price_idx, time, action_idx]

grid_before_ep = zeros(1,T);

grid_after_ep = zeros(1,T);

done_flag = true; % 완주 여부

for t = 1:T

% 상태

s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];

% ε-greedy 액션

if rand < epsilon

a_idx = randi(numActions);

else

[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));

end

a_kW = actions(a_idx) * P_ess_max;

% SOC 업데이트

if a_kW >= 0

SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;

else

SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;

end

% 하드 제약 위반 시 중단

if SOC_next > SOC_max

SOC_next = SOC_max; a_kW = 0;

elseif SOC_next < SOC_min

SOC_next = SOC_min; a_kW = 0;

elseif SOC_next < p_crt(t)

SOC_next = p_crt(t); a_kW = 0;

end

% 전력 기록

grid_before_ep(t) = load_real(t);

grid_after_ep(t) = load_real(t) + a_kW;

% 상태/행동 기록

episode_memory(end+1,:) = [s_idx, a_idx];

SOC = SOC_next;

end

% 완주한 경우만 Q 업데이트 & 기록

if done_flag && length(episode_memory) == T

cost_before_ep = sum(grid_before_ep .* price_real);

cost_after_ep = sum(grid_after_ep .* price_real);

saving_ep = cost_before_ep - cost_after_ep;

saving_history(ep) = saving_ep;

for step = 1:size(episode_memory,1)

s_idx = episode_memory(step,1:3);

a_idx = episode_memory(step,4);

Q(s_idx(1), s_idx(2), s_idx(3), a_idx) = ...

Q(s_idx(1), s_idx(2), s_idx(3), a_idx) + ...

alpha * (saving_ep - Q(s_idx(1), s_idx(2), s_idx(3), a_idx));

end

% ε 감소

if epsilon > epsilon_min

epsilon = epsilon * epsilon_decay;

end

% 완주율 기록

completion_rate(ep) = sum(~isnan(saving_history)) / ep;

if mod(ep,10000) == 0

fprintf("Episode %d: 완주=%d, 절감액=%.2f원, ε=%.3f\n", ...

ep, done_flag, saving_history(ep), epsilon);

end

Episode 10000: 완주=1, 절감액=425253.75원, ε=0.050 Episode 20000: 완주=1, 절감액=582866.25원, ε=0.050 Episode 30000: 완주=1, 절감액=670481.25원, ε=0.050 Episode 40000: 완주=1, 절감액=595275.00원, ε=0.050 Episode 50000: 완주=1, 절감액=678060.00원, ε=0.050 Episode 60000: 완주=1, 절감액=503115.00원, ε=0.050

%% ===== 학습 성과 시각화 =====

%% ===== 학습된 정책 시뮬레이션 =====

SOC = SOC0;

SOC_traj = zeros(1,T);

act_traj = zeros(1,T);

grid_power_before = zeros(1,T);

grid_power_after = zeros(1,T);

for t = 1:T

grid_power_before(t) = load_real(t);

s_idx = [discretizeState(SOC), discretizeState(price_norm(t)), t];

[~, a_idx] = max(Q(s_idx(1), s_idx(2), s_idx(3), :));

a_kW = actions(a_idx) * P_ess_max;

if a_kW >= 0

SOC_next = SOC + (a_kW / ESS_cap) * eff_cha;

else

SOC_next = SOC + (a_kW / ESS_cap) / eff_dch;

end

if SOC_next > SOC_max

SOC_next = SOC_max; a_kW = 0;

elseif SOC_next < SOC_min

SOC_next = SOC_min; a_kW = 0;

elseif SOC_next < p_crt(t)

SOC_next = p_crt(t); a_kW = 0;

end

grid_power_after(t) = load_real(t) + a_kW;

SOC_traj(t) = SOC_next;

act_traj(t) = a_kW;

SOC = SOC_next;

end

%% ===== 최종 비용 계산 =====

cost_before = sum(grid_power_before .* price_real);

cost_after = sum(grid_power_after .* price_real);

saving = cost_before - cost_after;

fprintf('최종 ESS 미사용 전 전기비용: %.3f 원\n', cost_before);

최종 ESS 미사용 전 전기비용: 6278802.431 원

fprintf('최종 ESS 사용 후 전기비용: %.3f 원\n', cost_after);

최종 ESS 사용 후 전기비용: 5651716.181 원

fprintf('최종 절감 금액: %.3f 원 (절감률 %.2f%%)\n', saving, saving/cost_before*100);

최종 절감 금액: 627086.250 원 (절감률 9.99%)

%% ===== 시뮬레이션 결과 시각화 =====

figure;

plot(saving_history); title('Learning Curve'); xlabel('Episode'); ylabel('Total Reward'); yticks(-4e5:1e5:9e5); grid on;

figure;

plot(100*SOC_traj,'LineWidth',1); hold on; plot(100*p_crt, 'r','LineWidth',1); title('SOC Trajectory'); ylabel('SOC(%)');ylim([-5 105]);legend('SOC','Critical Load'); grid on;

figure;

stairs(act_traj, '-x'); title('Action Trajectory (kW)'); grid on;

figure;

stairs(price_real); title('Price'); xlabel('Time'); ylabel('Price');

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

my Q-learning optimization is a little bit weird,

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

my Q-learning optimization is a little bit weird,

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos